Git Product home page Git Product logo

bcg-x-official / artkit Goto Github PK

View Code? Open in Web Editor NEW
95.0 5.0 9.0 30.81 MB

Automated prompt-based testing and evaluation of Gen AI applications

Home Page: https://bcg-x-official.github.io/artkit/_generated/home.html

License: Apache License 2.0

Shell 0.01% Python 2.14% CSS 0.01% JavaScript 0.01% Jupyter Notebook 97.83%
asyncio gen-ai genai python red-teaming test-automation data-science

artkit's Introduction

ARTKIT logo

Automated Red Teaming (ART) and testing toolkit

ARTKIT is a Python framework developed by BCG X for automating prompt-based testing and evaluation of Gen AI applications.

Documentation | User Guides | Examples

Getting started

Introduction

ARTKIT is a Python framework for developing automated end-to-end testing and evaluation pipelines for Gen AI applications. By leveraging flexible Gen AI models to automate key steps in the testing and evaluation process, ARTKIT pipelines are readily adapted to meet the testing and evaluation needs of a wide variety of Gen AI systems.

ARTKIT pipeline schematic

ARTKIT also supports automated multi-turn conversations between a challenger bot and a target system. Issues and vulnerabilities are more likely to arise after extended interactions with Gen AI systems, so multi-turn testing is critical for interactive applications.

We recommend starting with our User Guide to learn the core concepts and functionality of ARTKIT. Visit our Examples to see how ARTKIT can be used to test and evaluate Gen AI systems for:

  1. Q&A Accuracy:
  2. Upholding Brand Values:
    • Implement persona-based testing to simulate diverse users interacting with your system and evaluate system responses for brand conformity.
  3. Equitability:
    • Run a counterfactual experiment by systematically modifying demographic indicators across a set of documents and statistically evaluate system responses for undesired demographic bias.
  4. Safety:
  5. Security:

These are just a few examples of the many ways ARTKIT can be used to test and evaluate Gen AI systems for proficiency, equitability, safety, and security.

Key Features

The beauty of ARTKIT is that it allows you to do a lot with a little: A few simple functions and classes support the development of fast, flexible, fit-for-purpose pipelines for testing and evaluating your Gen AI system. Key features include:

  • Simple API: ARTKIT provides a small set of simple but powerful functions that support customized pipelines to test and evaluate virtually any Gen AI system.
  • Asynchronous: Leverage asynchronous processing to speed up processes that depend heavily on API calls.
  • Caching: Manage development costs by caching API responses to reduce the number of calls to external services.
  • Model Agnostic: ARTKIT supports connecting to major Gen AI model providers and allows users to develop new model classes to connect to any Gen AI service.
  • End-to-End Pipelines: Build end-to-end flows to generate test prompts, interact with a target system (i.e., system being tested), perform quantitative evaluations, and structure results for reporting.
  • Multi-Turn Conversations: Create automated interactive dialogs between a target system and an LLM persona programmed to interact with the target system in pursuit of a specific goal.
  • Robust Data Flows: Automatically track the flow of data through testing and evaluation pipelines, facilitating full traceability of data lineage in the results.
  • Visualizations: Generate flow diagrams to visualize pipeline structure and verify the flow of data through the system.

Note

ARTKIT is designed to be customized by data scientists and engineers to enhance human-in-the-loop testing and evaluation. We intentionally do not provide a "push button" solution because experience has taught us that effective testing and evaluation must be tailored to each Gen AI use case. Automation is a strategy for scaling and accelerating testing and evaluation, not a substitute for case-specific risk landscape mapping, domain expertise, and critical thinking.

Supported Model Providers

ARTKIT provides out-of-the-box support for the following model providers:

To connect to other services, users can develop new model classes.

Installation

ARTKIT supports both PyPI and Conda installations. We recommend installing ARTKIT in a dedicated virtual environment.

Pip

MacOS and Linux:

python -m venv artkit
source artkit/bin/activate
pip install artkit

Windows:

python -m venv artkit
artkit\Scripts\activate.bat
pip install artkit

Conda

conda install -c conda-forge artkit

Optional dependencies

To enable visualizations of pipeline flow diagrams, install GraphViz and ensure it is in your system's PATH variable:

Environment variables

Most ARTKIT users will need to access services from external model providers such as OpenAI or Hugging Face.

Our recommended approach is:

  1. Install python-dotenv using pip:
pip install python-dotenv

or conda:

conda install -c conda-forge python-dotenv
  1. Create a file named .env in your project root.
  2. Add .env to your .gitignore to ensure it is not committed to your Git repo.
  3. Define environment variables inside .env, for example, API_KEY=your_api_key
  4. In your Python scripts or notebooks, load the environmental variables with:
from dotenv import load_dotenv
load_dotenv()

# Verify that the environment variable is loaded
import os
os.getenv('YOUR_API_KEY')

The ARTKIT repository includes an example file called .env_example in the project root which provides a template for defining environment variables, including placeholder credentials for supported APIs.

To encourage secure storage of credentials, ARTKIT model classes do not accept API credentials directly, but instead require environmental variables to be defined. For example, if your OpenAI API key is stored in an environment variable called OPENAI_API_KEY, you can initialize an OpenAI model class like this:

import artkit.api as ak

ak.OpenAIChat(
    model_id="gpt-4o",
    api_key_env="OPENAI_API_KEY"
    )

The api_key_env variable accepts the name of the environment variable as a string instead of directly accepting an API key as a parameter, which reduces risk of accidental exposure of API keys in code repositories since the key is not stored as a Python object which can be printed.

Quick Start

The core ARTKIT functions are:

  1. run: Execute one or more pipeline steps
  2. step: A single pipeline step which produces a dictionary or an iterable of dictionaries
  3. chain: A set of steps that run in sequence
  4. parallel: A set of steps that run in parallel

Below, we develop a simple example pipeline with the following steps:

  1. Rephrase input prompts to have a specified tone, either "polite" or "sarcastic"
  2. Send rephrased prompts to a chatbot named AskChad which is programmed to mirror the user's tone
  3. Evaluate the responses according to a "sarcasm" metric

To begin, import artkit.api and set up a session with the OpenAI GPT-4o model. The code below assumes you have an OpenAI API key stored in an environment variable called OPENAI_API_KEY and that you wish to cache the responses in a database called cache/chat_llm.db.

import artkit.api as ak

# Set up a chat system with the OpenAI GPT-4o model
chat_llm = ak.CachedChatModel(
    model=ak.OpenAIChat(model_id="gpt-4o"),
    database="cache/chat_llm.db"
)

Next, define a few functions that will be used as pipeline steps. ARTKIT is designed to work with asynchronous generators to allow for asynchronous processing, so the functions below are defined with async, await, and yield keywords.

# A function that rephrases input prompts to have a specified tone
async def rephrase_tone(prompt: str, tone: str, llm: ak.ChatModel):

    response = await llm.get_response(
        message = (
            f"Your job is to rephrase in input question to have a {tone} tone.\n"
            f"This is the question you must rephrase:\n{prompt}"
        )
    )

    yield {"prompt": response[0], "tone": tone}


# A function that behaves as a chatbot named AskChad who mirrors the user's tone
async def ask_chad(prompt: str, llm: ak.ChatModel):

    response = await llm.get_response(
        message = (
            "You are AskChad, a chatbot that mirrors the user's tone. "
            "For example, if the user is rude, you are rude. "
            "Your responses contain no more than 10 words.\n"
            f"Respond to this user input:\n{prompt}"
        )
    )

    yield {"response": response[0]}


# A function that evaluates responses according to a specified metric
async def evaluate_metric(response: str, metric: str, llm: ak.ChatModel):

    score = await llm.get_response(
        message = (
            f"Your job is to evaluate prompts according to whether they are {metric}. "
            f"If the input prompt is {metric}, return 1, otherwise return 0.\n"
            f"Please evaluate the following prompt:\n{response}"
        )
    )

    yield {"evaluation_metric": metric, "score": int(score[0])}

Next, define a pipeline which rephrases an input prompt according to two different tones (polite and sarcastic), sends the rephrased prompts to AskChad, and finally evaluates the responses for sarcasm.

pipeline = (
    ak.chain(
        ak.parallel(
            ak.step("tone_rephraser", rephrase_tone, tone="POLITE", llm=chat_llm),
            ak.step("tone_rephraser", rephrase_tone, tone="SARCASTIC", llm=chat_llm),
        ),
        ak.step("ask_chad", ask_chad, llm=chat_llm),
        ak.step("evaluation", evaluate_metric, metric="SARCASTIC", llm=chat_llm)
    )
)

pipeline.draw()

sphinx/source/_images/quick_start_flow_diagram.png

Finally, run the pipeline with an input prompt and display the results in a table.

# Input to run through the pipeline
prompt = {"prompt": "What is a fun activity to do in Boston?"}

# Run pipeline
result = ak.run(steps=pipeline, input=prompt)

# Convert results dictionary into a multi-column dataframe
result.to_frame()

sphinx/source/_images/quick_start_results.png

From left to right, the results table shows:

  1. input: The original input prompt
  2. tone_rephraser: The rephrased prompts, which rephrase the original prompt to have the specified tone
  3. ask_chad: The response from AskChad, which mirrors the tone of the user
  4. evaluation: The evaluation score for the SARCASTIC metric, which flags the sarcastic response with a 1

For a complete introduction to ARTKIT, please visit our User Guide and Examples.

Contributing

Contributions to ARTKIT are welcome and appreciated! Please see the Contributor Guide section for information.

License

This project is licensed under Apache 2.0, allowing free use, modification, and distribution with added protections against patent litigation. See the LICENSE file for more details or visit Apache 2.0.

BCG X

BCG X is the tech build and design unit of Boston Consulting Group.

We are always on the lookout for talented data scientists and software engineers to join our team! Visit BCG X Careers to learn more.

artkit's People

Contributors

alexanderlontke avatar betcherj avatar breakbotz avatar eltociear avatar j-ittner avatar matthew-wong-bcg avatar rgriff23 avatar seanggani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

artkit's Issues

Add Badges

Issue

Currently, the README.rst file does not contain any displayed badges that represent the repository. Some are in pypi_description.rst but don't have full list (see below)

Additional details on the .rst file format are located here

Specifically, we'd like to add badges for the following:

  • pypi
  • anaconda version
  • python versions
  • code style (black)
  • sphinx (doc build)
  • license (apache)
  • GitHub Action build status
  • Contributor Convenant

Here's an example with all badges we want to replicate. Note: the example uses Azure DevOps to add the test coverage badge but this repo utilizes GitHub Actions so a different approach is required and will entail changes to the artkit-release-pipeline.yml. Given it's a more complex change, a separate issue documents the approach:

Solution

  • Update Readme.rst and pypi_description.rst with the relevant badges
    • Add the following where the badges should be displayed (example)
.. Begin-Badges

|pypi| |conda| |python_versions| |code_style| |made_with_sphinx_doc| |license_badge| |Contributor_Convenant|

.. End-Badges
    • Add the targets for the URLs where the badges should point to (example)
.. Begin-Badges

.. |pypi| image:: https://badge.fury.io/py/artkit.svg
    :target: https://pypi.org/project/artkit/

.. |conda| image:: https://anaconda.org/bcg_gamma/gamma-facet/badges/version.svg
    :target: https://anaconda.org/BCG_Gamma/artkit

.. |python_versions| image:: https://img.shields.io/badge/python-3.10|3.11|3.12-blue.svg
   :target: https://www.python.org/downloads/release/python-3100/

.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black

.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
   :target: https://bcg-x-official.github.io/facet/index.html

.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
   :target: https://opensource.org/licenses/Apache-2.0

.. image:: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml/badge.svg
    :target: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml
    :alt: ARTKIT Release Pipeline

.. |Contributor_Convenant| image:: https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg 
   :target: CODE_OF_CONDUCT.md

.. End-Badges
  • Update the RELEASE_NOTES.rst file:
    • Increment a minor version here

Add LLM connector for Amazon Bedrock API

Justification

Amazon Bedrock provides an extensive suite of Gen AI models via a single API. Supporting Amazon Bedrock will significantly enhance ARTKIT's versatility.

Details

Note that the Bedrock API supports multiple modalities, depending on the selected model. This issue is for LLMs. The new class will live in src/artkit/model/llm/aws/

Be sure to update the docs:

Please see the tutorial: Creating New Model Classes

Test bug report

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac/Windows
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

Expose ChatbotModel as part of default artkit.api export

Is your feature request related to a problem? Please describe.

Previously, classes in model.llm.base like ChatModel where reachable through artkit.api and its been popular to use ak.ChatModel for type notations in notebooks. However, this is no longer possible as from .model.llm.base import * is missing from api.py

Describe the solution you'd like

Selectively include classes from .base modules such as ChatModel : from .model.llm.base import ChatModel

Describe alternatives you've considered

Alternative is to import all packages from base, which might be too much

Add links to Jupyter notebooks on GitHub at the top of Sphinx tutorials

Problem

The tutorials in our Sphinx documentation are HTML pages built from Jupyter notebooks. Readers may want to download the Jupyter notebook so they can run the code, but it isn't clear where they can find the notebook.

Solution

Within each notebook tutorial, include a hyperlink to the Jupyter notebook on GitHub. From GitHub, the user can choose to download the notebook and explore related files (e.g., data or cache files which are used by the notebook).

Details

Place hyperlinks just beneath the main title of notebooks, e.g.:

# Building Your First ARTKIT Pipeline

[View notebook on GitHub](https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/building_your_first_artkit_pipeline.ipynb)

Add hyperlinks to all Jupyter notebooks which are included in the User Guide and Examples sections of the Sphinx documentation.

In the artkit GitHub repo, you can find these files within:

  • sphinx/source/user_guide/
  • sphinx/source/examples/

Verifying your changes

  • Double check that you included correct hyperlinks for all notebook tutorials in the User Guide and Examples
  • Note that some docs pages are built from RST files instead of notebooks (e.g., the index pages for the User Guide and Examples, the intro to Gen AI Testing and Evaluation in the User Guide). These pages do not need hyperlinks.

Test issue notification

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac/Windows
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

Add connector for any HTTP endpoint

Is your feature request related to a problem? Please describe.

The only connectors that exist currently are for existing models with clearly defined APIs. There is no way for a user to specify any HTTP endpoint to get a response.

Describe the solution you'd like

I want to be able to use ARTKIT with any endpoint that accepts a payload.

Describe alternatives you've considered

None

Additional context

It seems like a version of the existing HuggingFace connector could be leveraged

`artkit.api` does not import `CachedDiffusionModel` and `CachedVisionModel`

Describe the bug

artkit.api does not import CachedDiffusionModel and CachedVisionModel.

To Reproduce

Steps to reproduce the behavior:

  1. Run the Connecting to Gen AI Models notebook: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
  2. The OpenAI "text-to-image" and "vision" sections will fail

Expected behavior

CachedDiffusionModel and CachedVisionModel should be imported with artkit.api

Additional context

The fix is to update the following imports in src/artkit/api.py, which currently do not seem to import anything:

from .model.diffusion import *
from .model.vision import *

to:

from .model.diffusion.base import *
from .model.vision.base import *

Add diffusion connector for Amazon Bedrock API

Justification

Amazon Bedrock provides an extensive suite of Gen AI models via a single API. Supporting Amazon Bedrock will significantly enhance ARTKIT's versatility.

Details

Note that the Bedrock API supports multiple modalities, depending on the selected model. This issue is for text-to-image diffusion models. The new class will live in src/artkit/model/diffusion/aws/

Be sure to update the docs:

Please see the tutorial: Creating New Model Classes

Third issue notification test

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac/Windows
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

Add LLM connector for Cohere

Justification

Cohere focuses on providing low-latency LLMs. They have an async endpoint we can use and free API keys.

Details

The new class will live in src/artkit/model/llm/cohere/

Be sure to update the docs:

Please see the tutorial: Creating New Model Classes

New test for issue notification

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac/Windows
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

Support clearing cached results created or accessed after a given `datetime`

Issue

Currently, CacheDB.clear allows users to clear cached results which were accessed or created before a given datetime:

However, it is not possible to clear results after a given datetime. This would be useful when I make a mistake midway through an analysis and I want to clear only the most recent results from my cache.

Solution

  • Update the CacheDB class:
  • Update the CachedGenAIModel class (uses CacheDB):

Details

Once implemented, be sure to:

Docs: Progress bar for pipelines

We plan to add an optional progress bar to the run method in the dependency, fluxus: BCG-X-Official/fluxus#23

It will then be available to the run method in artkit and we should introduce it in the "Building Your First ARTKIT Pipeline" tutorial, specifically in the section where we introduce the core ARTKIT functions.

Once the progress bar is added and we have updated the fluxus version in artkit, we will re-open this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.