Git Product home page Git Product logo

aileen2's Introduction

Aileen 2.0 - AI Office Agent

Welcome to Aileen 2.0! Aileen is an AI office agent designed to assist with specific tasks, currently focused on summarizing legislative proceedings broadcast by Germany's parliamentary TV. Leveraging recent advances in transformer technology, Aileen can navigate and adapt to changes in web environments autonomously, providing personalized summaries via email to authorized users.

Please note that Aileen 2.0 is specialized for this single use-case and is not (yet?) a general-purpose AI assistant.

Aileen 2 was created for 'NVIDIA and LangChain #GenerativeAI Agents Developer Contest'. Treat it as a technology preview.

Features and special techniques

  • Vision: can "look" at websites through Google PaliGemma and EasyOCR CRAFT-CDNN
  • Small Language Model support
  • Function-calling implemented through Python syntax (not JSON, as commonly used)
  • uses Nvidia NIMs

Prerequisites

  • Ubuntu GNU/Linux, Python 3
  • Nvidia NGC API key
  • Nvidia CUDA and PyTorch set up. Tested variants:
    • Option 1: Amazon Web Services:
      • AMI: "Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.3 (Ubuntu 20.04)"
        • activate environment with "conda activate pytorch"
      • Instance type: g4dn.xlarge
        • (provides Nvidia T4)
        • 100 GB root volume
        • optionally: Security Group settings that will allow inbound HTTP for Twilio SMS Webhook
    • Option 2: Vast.ai:
      • Instance type: 1x RTX A5000
      • Template: Nvcr.io/Nvidia/Pytorch
      • Disk: 60 GB
      • Launch mode: "Run interactive shell server, SSH"
      • (cli command:)

vastai create instance x --image nvcr.io/nvidia/pytorch:23.10-py3 --env '-e DATA_DIRECTORY=/workspace/' --disk 60.12945594969661 --ssh --direct

Installation

  1. (ensure that your PyTorch enabled venv is enabled)
    • with the AWS Deep Learning AMI:
      • conda init
      • conda activate pytorch
    • with the Nvidia PyTorch container:
      • sudo apt install python3.10-venv
      • python3 -m venv venv
      • source venv/bin/activate
  2. download this repository:
  3. install dependencies:
    • cd aileen2; pip install -U -r requirements.txt
  4. install the transformer models
    1. option 1: if you have a direct link to PaliGemma MIX 448:
      1. at the console:
        • python3 ./setup_resources.py --vlm-model-url "https://..."
    2. option 2: install using Hugging Face:
      1. log in to HuggingFace, request access to the MIX checkpoint
      2. at the console:
        • huggingface-cli login
        • huggingface-cli download "google/paligemma-3b-mix-448"
      3. install the rest:
        • python3 ./setup_resources.py
  5. integrations are mocked-up by default, to switch: in config.json set:
    • implementations/agent_lm: lm_ngc
    • implementations/vlm: vlm_hf
    • implementations/ocr: ocr_easyocr
  6. create a file named ".env" in this folder:
    • touch .env
  7. set Nvidia API key
    1. Option 1: in .env:
      • NVIDIA_API_KEY=nvapi-...
    2. Option 2: in config.json:
      • key "nvidia_api_key"
  8. optionally, if Aileen shall be reachable via SMS text messages ("Cloud office" option):
    1. create a Twilio account, get a Twilio phone number, retrieve Auth Token
    2. add Twilio Auth Token:
      1. Option 1: to .env:
        • TWILIO_AUTH_TOKEN=...
      2. Option 2: to config.json:
        • key "twilio_auth_token"
    3. in config.json, under "server", verify that the Webhook necessary to receive notifications on can be established at "host" and "port"
    4. in the Twilio management console (Phone Numbers -> Active Numbers -> (Number) -> "Configure" tab -> scroll down to "Messaging Configuration"), establish an HTTP POST Webhook: as "http://(host):(port)/sms", e.g.:
  9. optionally, if Aileen shall send results by E-Mail:
    1. set up Amazon Simple Email Service
    2. if in a sandboxed account, be sure to register any recepients under Identities
    3. for the IAM user, role "AmazonSESFullAccess" can be used
    4. add AWS Access Key and Secret Access Key to .env:
      • AWS_ACCESS_KEY_ID=A...
      • AWS_SECRET_ACCESS_KEY=B...
  10. add user profile(s) to config.json: key "users":
    "users": {
        "+18005550100": {
            "profile": "Software Engineer",
            "email": "[email protected]"
        }

The key (+1800... in this example) is the user’s phone number, the "profile" therein their profile text (used for the personalized summary) and recipient E-Mail address.

Advanced Configuration

When the mockups have been switched for real implementations (see Installing above), the Language Models Llama3-8B-Instruct (for the Agent) and Gemma-7B (for summarization) are used by default. This can be changed in config.json to models offered through the Nvidia NGC Model Catalog. Not all models are supported, though. A list of supported models can be found in lm.py ("get_ctx_len_for_model"). For each model, the Tokenizer needs to be accessible. For gated models, this can be achieved by obtaining access via Hugging Face (and setting HF_TOKEN in .env) or establishing a publicly accessible copy in lm.py ("get_tokenizer_for_model").

Running

  1. (use TRANSFORMERS_OFFLINE=1 if a gated model (like PaliGemma) is to be used without Hugging Face access), e.g.:
    • export TRANSFORMERS_OFFLINE=1
  2. Option 1: on local device:
  3. Option 2: on server ("cloud office"):
    • python3 ./server.py

Anticipated questions

SessionNotCreatedException

Error message:

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally. (session not created: DevToolsActivePort file doesn't exist) (The process started from chrome location /home/ubuntu/.cache/selenium/chrome/linux64/126.0.6478.61/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Remedy: check that Chrome is installed and Selenium is set up (see Prerequisites above)

Access to model is restricted

Error message:

Access to model google/paligemma-3b-mix-448 is restricted. You must be authenticated to access it.

Remedy: after PaliGemma has been downloaded to the Hugging Face cache (either through a direct download/setup-resources.py or huggingface-cli, see above), both main.py and server.py can be run after the TRANSFORMERS_OFFLINE environment variable has been set, e.g. in the file .env:

TRANSFORMERS_OFFLINE=1

If setup-resources.py had failed previously after a successful direct-download of PaliGemma, you can re-run setup-resources.py using the already downloaded tar instead of the direct-download URL:

python3 ./setup_resources.py --vlm-model-url /tmp/paligemma.tar

module 'cv2.dnn' has no attribute 'DictValue'

Error message:

AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

Set up a fresh venv. Assuming the Nvidia PyTorch container:

apt install python3.10-venv
python3 -m venv venv
source venv/bin/activate

Then, repeat installation procedure as above.

Acknowledgments

The PaliGemma parser in paligemma/ was taken from Big-Vision repository, where it was released under Apache-2.0 license.

aileen2's People

Contributors

ndurner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.