bcg-x-official / artkit Goto Github PK

View Code? Open in Web Editor NEW

95.0 5.0 9.0 30.81 MB

Automated prompt-based testing and evaluation of Gen AI applications

Home Page: https://bcg-x-official.github.io/artkit/_generated/home.html

License: Apache License 2.0

Shell 0.01% Python 2.14% CSS 0.01% JavaScript 0.01% Jupyter Notebook 97.83%

asyncio gen-ai genai python red-teaming test-automation data-science

artkit's Introduction

Automated Red Teaming (ART) and testing toolkit

ARTKIT is a Python framework developed by BCG X for automating prompt-based testing and evaluation of Gen AI applications.

Documentation | User Guides | Examples

Getting started

See the ARTKIT Documentation for our User Guide, Examples, API reference, and more.
See Contributing or visit our Contributor Guide for information on contributing.
We have an FAQ for common questions. For anything else, please reach out to [email protected].

Introduction

ARTKIT is a Python framework for developing automated end-to-end testing and evaluation pipelines for Gen AI applications. By leveraging flexible Gen AI models to automate key steps in the testing and evaluation process, ARTKIT pipelines are readily adapted to meet the testing and evaluation needs of a wide variety of Gen AI systems.

ARTKIT also supports automated multi-turn conversations between a challenger bot and a target system. Issues and vulnerabilities are more likely to arise after extended interactions with Gen AI systems, so multi-turn testing is critical for interactive applications.

We recommend starting with our User Guide to learn the core concepts and functionality of ARTKIT. Visit our Examples to see how ARTKIT can be used to test and evaluate Gen AI systems for:

Q&A Accuracy:
- Generate a Q&A golden dataset from ground truth documents, augment questions to simulate variation in user inputs, and evaluate system responses for faithfulness, completeness, and relevancy.
Upholding Brand Values:
- Implement persona-based testing to simulate diverse users interacting with your system and evaluate system responses for brand conformity.
Equitability:
- Run a counterfactual experiment by systematically modifying demographic indicators across a set of documents and statistically evaluate system responses for undesired demographic bias.
Safety:
- Use adversarial prompt augmentation to strengthen adversarial prompts drawn from a prompt library and evaluate system responses for refusal to engage with adversarial inputs .
Security:
- Use multi-turn attackers to execute multi-turn strategies for extracting the system prompt from a chatbot, challenging the system's defenses against prompt exfiltration.

These are just a few examples of the many ways ARTKIT can be used to test and evaluate Gen AI systems for proficiency, equitability, safety, and security.

Key Features

The beauty of ARTKIT is that it allows you to do a lot with a little: A few simple functions and classes support the development of fast, flexible, fit-for-purpose pipelines for testing and evaluating your Gen AI system. Key features include:

Simple API: ARTKIT provides a small set of simple but powerful functions that support customized pipelines to test and evaluate virtually any Gen AI system.
Asynchronous: Leverage asynchronous processing to speed up processes that depend heavily on API calls.
Caching: Manage development costs by caching API responses to reduce the number of calls to external services.
Model Agnostic: ARTKIT supports connecting to major Gen AI model providers and allows users to develop new model classes to connect to any Gen AI service.
End-to-End Pipelines: Build end-to-end flows to generate test prompts, interact with a target system (i.e., system being tested), perform quantitative evaluations, and structure results for reporting.
Multi-Turn Conversations: Create automated interactive dialogs between a target system and an LLM persona programmed to interact with the target system in pursuit of a specific goal.
Robust Data Flows: Automatically track the flow of data through testing and evaluation pipelines, facilitating full traceability of data lineage in the results.
Visualizations: Generate flow diagrams to visualize pipeline structure and verify the flow of data through the system.

Note

ARTKIT is designed to be customized by data scientists and engineers to enhance human-in-the-loop testing and evaluation. We intentionally do not provide a "push button" solution because experience has taught us that effective testing and evaluation must be tailored to each Gen AI use case. Automation is a strategy for scaling and accelerating testing and evaluation, not a substitute for case-specific risk landscape mapping, domain expertise, and critical thinking.

Supported Model Providers

ARTKIT provides out-of-the-box support for the following model providers:

To connect to other services, users can develop new model classes.

Installation

ARTKIT supports both PyPI and Conda installations. We recommend installing ARTKIT in a dedicated virtual environment.

Pip

MacOS and Linux:

python -m venv artkit
source artkit/bin/activate
pip install artkit

Windows:

python -m venv artkit
artkit\Scripts\activate.bat
pip install artkit

Conda

conda install -c conda-forge artkit

Optional dependencies

To enable visualizations of pipeline flow diagrams, install GraphViz and ensure it is in your system's PATH variable:

For MacOS and Linux users, instructions provided on GraphViz Downloads automatically add GraphViz to your path.
Windows users may need to manually add GraphViz to your PATH (see Simplified Windows installation procedure).
Run dot -V in Terminal or Command Prompt to verify installation.

Environment variables

Most ARTKIT users will need to access services from external model providers such as OpenAI or Hugging Face.

Our recommended approach is:

Install python-dotenv using pip:

pip install python-dotenv

or conda:

conda install -c conda-forge python-dotenv

Create a file named .env in your project root.
Add .env to your .gitignore to ensure it is not committed to your Git repo.
Define environment variables inside .env, for example, API_KEY=your_api_key
In your Python scripts or notebooks, load the environmental variables with:

from dotenv import load_dotenv
load_dotenv()

# Verify that the environment variable is loaded
import os
os.getenv('YOUR_API_KEY')

The ARTKIT repository includes an example file called .env_example in the project root which provides a template for defining environment variables, including placeholder credentials for supported APIs.

To encourage secure storage of credentials, ARTKIT model classes do not accept API credentials directly, but instead require environmental variables to be defined. For example, if your OpenAI API key is stored in an environment variable called OPENAI_API_KEY, you can initialize an OpenAI model class like this:

import artkit.api as ak

ak.OpenAIChat(
    model_id="gpt-4o",
    api_key_env="OPENAI_API_KEY"
    )

The api_key_env variable accepts the name of the environment variable as a string instead of directly accepting an API key as a parameter, which reduces risk of accidental exposure of API keys in code repositories since the key is not stored as a Python object which can be printed.

Quick Start

The core ARTKIT functions are:

run: Execute one or more pipeline steps
step: A single pipeline step which produces a dictionary or an iterable of dictionaries
chain: A set of steps that run in sequence
parallel: A set of steps that run in parallel

Below, we develop a simple example pipeline with the following steps:

Rephrase input prompts to have a specified tone, either "polite" or "sarcastic"
Send rephrased prompts to a chatbot named AskChad which is programmed to mirror the user's tone
Evaluate the responses according to a "sarcasm" metric

To begin, import artkit.api and set up a session with the OpenAI GPT-4o model. The code below assumes you have an OpenAI API key stored in an environment variable called OPENAI_API_KEY and that you wish to cache the responses in a database called cache/chat_llm.db.

import artkit.api as ak

# Set up a chat system with the OpenAI GPT-4o model
chat_llm = ak.CachedChatModel(
    model=ak.OpenAIChat(model_id="gpt-4o"),
    database="cache/chat_llm.db"
)

Next, define a few functions that will be used as pipeline steps. ARTKIT is designed to work with asynchronous generators to allow for asynchronous processing, so the functions below are defined with async, await, and yield keywords.

# A function that rephrases input prompts to have a specified tone
async def rephrase_tone(prompt: str, tone: str, llm: ak.ChatModel):

    response = await llm.get_response(
        message = (
            f"Your job is to rephrase in input question to have a {tone} tone.\n"
            f"This is the question you must rephrase:\n{prompt}"
        )
    )

    yield {"prompt": response[0], "tone": tone}


# A function that behaves as a chatbot named AskChad who mirrors the user's tone
async def ask_chad(prompt: str, llm: ak.ChatModel):

    response = await llm.get_response(
        message = (
            "You are AskChad, a chatbot that mirrors the user's tone. "
            "For example, if the user is rude, you are rude. "
            "Your responses contain no more than 10 words.\n"
            f"Respond to this user input:\n{prompt}"
        )
    )

    yield {"response": response[0]}


# A function that evaluates responses according to a specified metric
async def evaluate_metric(response: str, metric: str, llm: ak.ChatModel):

    score = await llm.get_response(
        message = (
            f"Your job is to evaluate prompts according to whether they are {metric}. "
            f"If the input prompt is {metric}, return 1, otherwise return 0.\n"
            f"Please evaluate the following prompt:\n{response}"
        )
    )

    yield {"evaluation_metric": metric, "score": int(score[0])}

Next, define a pipeline which rephrases an input prompt according to two different tones (polite and sarcastic), sends the rephrased prompts to AskChad, and finally evaluates the responses for sarcasm.

pipeline = (
    ak.chain(
        ak.parallel(
            ak.step("tone_rephraser", rephrase_tone, tone="POLITE", llm=chat_llm),
            ak.step("tone_rephraser", rephrase_tone, tone="SARCASTIC", llm=chat_llm),
        ),
        ak.step("ask_chad", ask_chad, llm=chat_llm),
        ak.step("evaluation", evaluate_metric, metric="SARCASTIC", llm=chat_llm)
    )
)

pipeline.draw()

Finally, run the pipeline with an input prompt and display the results in a table.

# Input to run through the pipeline
prompt = {"prompt": "What is a fun activity to do in Boston?"}

# Run pipeline
result = ak.run(steps=pipeline, input=prompt)

# Convert results dictionary into a multi-column dataframe
result.to_frame()

From left to right, the results table shows:

input: The original input prompt
tone_rephraser: The rephrased prompts, which rephrase the original prompt to have the specified tone
ask_chad: The response from AskChad, which mirrors the tone of the user
evaluation: The evaluation score for the SARCASTIC metric, which flags the sarcastic response with a 1

For a complete introduction to ARTKIT, please visit our User Guide and Examples.

Contributing

Contributions to ARTKIT are welcome and appreciated! Please see the Contributor Guide section for information.

License

This project is licensed under Apache 2.0, allowing free use, modification, and distribution with added protections against patent litigation. See the LICENSE file for more details or visit Apache 2.0.

BCG X

BCG X is the tech build and design unit of Boston Consulting Group.

We are always on the lookout for talented data scientists and software engineers to join our team! Visit BCG X Careers to learn more.

artkit's People

Contributors

Stargazers

Watchers

Forkers

nishanthsridharan techthiyanes bhardwajshivam betcherj breakbotz seanggani parthjohri eltociear seasonedmariner

artkit's Issues

Add Badges

Issue

Currently, the README.rst file does not contain any displayed badges that represent the repository. Some are in pypi_description.rst but don't have full list (see below)

Additional details on the .rst file format are located here

Specifically, we'd like to add badges for the following:

pypi
anaconda version
python versions
code style (black)
sphinx (doc build)
license (apache)
GitHub Action build status
Contributor Convenant

Here's an example with all badges we want to replicate. Note: the example uses Azure DevOps to add the test coverage badge but this repo utilizes GitHub Actions so a different approach is required and will entail changes to the artkit-release-pipeline.yml. Given it's a more complex change, a separate issue documents the approach:

Solution

Update Readme.rst and pypi_description.rst with the relevant badges
- Add the following where the badges should be displayed (example)

.. Begin-Badges

|pypi| |conda| |python_versions| |code_style| |made_with_sphinx_doc| |license_badge| |Contributor_Convenant|

.. End-Badges

- Add the targets for the URLs where the badges should point to (example)

.. Begin-Badges

.. |pypi| image:: https://badge.fury.io/py/artkit.svg
    :target: https://pypi.org/project/artkit/

.. |conda| image:: https://anaconda.org/bcg_gamma/gamma-facet/badges/version.svg
    :target: https://anaconda.org/BCG_Gamma/artkit

.. |python_versions| image:: https://img.shields.io/badge/python-3.10|3.11|3.12-blue.svg
   :target: https://www.python.org/downloads/release/python-3100/

.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black

.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
   :target: https://bcg-x-official.github.io/facet/index.html

.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
   :target: https://opensource.org/licenses/Apache-2.0

.. image:: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml/badge.svg
    :target: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml
    :alt: ARTKIT Release Pipeline

.. |Contributor_Convenant| image:: https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg 
   :target: CODE_OF_CONDUCT.md

.. End-Badges

Update the RELEASE_NOTES.rst file:
- Increment a minor version here

Add LLM connector for Amazon Bedrock API

Justification

Amazon Bedrock provides an extensive suite of Gen AI models via a single API. Supporting Amazon Bedrock will significantly enhance ARTKIT's versatility.

Details

Note that the Bedrock API supports multiple modalities, depending on the selected model. This issue is for LLMs. The new class will live in src/artkit/model/llm/aws/

Be sure to update the docs:

README: https://github.com/BCG-X-Official/artkit?tab=readme-ov-file#supported-model-providers
Contributing: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
Connecting to Gen AI Models: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
.env_example: https://github.com/BCG-X-Official/artkit/blob/1.0.x/.env_example

Please see the tutorial: Creating New Model Classes

Test bug report

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Mac/Windows
Version [e.g. 22]

Additional context

Add any other context about the problem here.

Expose ChatbotModel as part of default artkit.api export

Is your feature request related to a problem? Please describe.

Previously, classes in model.llm.base like ChatModel where reachable through artkit.api and its been popular to use ak.ChatModel for type notations in notebooks. However, this is no longer possible as from .model.llm.base import * is missing from api.py

Describe the solution you'd like

Selectively include classes from .base modules such as ChatModel : from .model.llm.base import ChatModel

Describe alternatives you've considered

Alternative is to import all packages from base, which might be too much

Add links to Jupyter notebooks on GitHub at the top of Sphinx tutorials

Problem

The tutorials in our Sphinx documentation are HTML pages built from Jupyter notebooks. Readers may want to download the Jupyter notebook so they can run the code, but it isn't clear where they can find the notebook.

Solution

Within each notebook tutorial, include a hyperlink to the Jupyter notebook on GitHub. From GitHub, the user can choose to download the notebook and explore related files (e.g., data or cache files which are used by the notebook).

Details

Place hyperlinks just beneath the main title of notebooks, e.g.:

# Building Your First ARTKIT Pipeline

[View notebook on GitHub](https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/building_your_first_artkit_pipeline.ipynb)

Add hyperlinks to all Jupyter notebooks which are included in the User Guide and Examples sections of the Sphinx documentation.

In the artkit GitHub repo, you can find these files within:

sphinx/source/user_guide/
sphinx/source/examples/

Verifying your changes

Double check that you included correct hyperlinks for all notebook tutorials in the User Guide and Examples
Note that some docs pages are built from RST files instead of notebooks (e.g., the index pages for the User Guide and Examples, the intro to Gen AI Testing and Evaluation in the User Guide). These pages do not need hyperlinks.

Test issue notification

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Mac/Windows
Version [e.g. 22]

Additional context

Add any other context about the problem here.

Add connector for any HTTP endpoint

Is your feature request related to a problem? Please describe.

The only connectors that exist currently are for existing models with clearly defined APIs. There is no way for a user to specify any HTTP endpoint to get a response.

Describe the solution you'd like

I want to be able to use ARTKIT with any endpoint that accepts a payload.

Describe alternatives you've considered

None

Additional context

It seems like a version of the existing HuggingFace connector could be leveraged

Add diffusion connector for Google's Gemini

Google's Gemini supports text-to-image, but currently we only have a text-to-text connector. We should support text-to-image.

Details

The new class will live here: src/artkit/model/diffusion/gemini/

Be sure to update the docs:

Connecting to Gen AI Models: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb

Please see the tutorial: Creating New Model Classes

`artkit.api` does not import `CachedDiffusionModel` and `CachedVisionModel`

Describe the bug

artkit.api does not import CachedDiffusionModel and CachedVisionModel.

To Reproduce

Steps to reproduce the behavior:

Run the Connecting to Gen AI Models notebook: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
The OpenAI "text-to-image" and "vision" sections will fail

Expected behavior

CachedDiffusionModel and CachedVisionModel should be imported with artkit.api

Additional context

The fix is to update the following imports in src/artkit/api.py, which currently do not seem to import anything:

from .model.diffusion import *
from .model.vision import *

to:

from .model.diffusion.base import *
from .model.vision.base import *

Add diffusion connector for Amazon Bedrock API

Justification

Amazon Bedrock provides an extensive suite of Gen AI models via a single API. Supporting Amazon Bedrock will significantly enhance ARTKIT's versatility.

Details

Note that the Bedrock API supports multiple modalities, depending on the selected model. This issue is for text-to-image diffusion models. The new class will live in src/artkit/model/diffusion/aws/

Be sure to update the docs:

README: https://github.com/BCG-X-Official/artkit?tab=readme-ov-file#supported-model-providers
Contributing: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
Connecting to Gen AI Models: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
.env_example: https://github.com/BCG-X-Official/artkit/blob/1.0.x/.env_example

Please see the tutorial: Creating New Model Classes

Third issue notification test

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Mac/Windows
Version [e.g. 22]

Additional context

Add any other context about the problem here.

Please can you provide documentation regarding gemini model connection.

As the https://bcg-x-official.github.io/artkit/user_guide/introduction_to_artkit/connecting_to_genai_models.html#Google link is empty with no connection result.

Add LLM connector for Cohere

Justification

Cohere focuses on providing low-latency LLMs. They have an async endpoint we can use and free API keys.

Details

The new class will live in src/artkit/model/llm/cohere/

Be sure to update the docs:

README: https://github.com/BCG-X-Official/artkit?tab=readme-ov-file#supported-model-providers
Contributing: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
Connecting to Gen AI Models: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb
.env_example: https://github.com/BCG-X-Official/artkit/blob/1.0.x/.env_example

Please see the tutorial: Creating New Model Classes

New test for issue notification

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Mac/Windows
Version [e.g. 22]

Additional context

Add any other context about the problem here.

Support clearing cached results created or accessed after a given `datetime`

Issue

Currently, CacheDB.clear allows users to clear cached results which were accessed or created before a given datetime:

https://github.com/BCG-X-Official/artkit/blob/1.0.x/src/artkit/model/cache/_cache.py#L369

However, it is not possible to clear results after a given datetime. This would be useful when I make a mistake midway through an analysis and I want to clear only the most recent results from my cache.

Solution

Update the CacheDB class:
- Add accessed_after and created_after optional arguments to the CacheDB.clear method
- Update the dynamic SQL query accordingly
Update the CachedGenAIModel class (uses CacheDB):
- Add the accessed_after and created_after optional arguments to the CachedGenAIModel.clear_cache method

Details

Once implemented, be sure to:

Incorporate tests for the new arguments in test_cache_clearing
Mention this capability in the Cache Management tutorial

Docs: Progress bar for pipelines

We plan to add an optional progress bar to the run method in the dependency, fluxus: BCG-X-Official/fluxus#23

It will then be available to the run method in artkit and we should introduce it in the "Building Your First ARTKIT Pipeline" tutorial, specifically in the section where we introduce the core ARTKIT functions.

Once the progress bar is added and we have updated the fluxus version in artkit, we will re-open this issue.

Add vision connector for Google's Gemini

Google's Gemini supports image-to-text, but currently we only have a text-to-text connector. We should support image-to-text.

Details

The new class will live here: src/artkit/model/vision/gemini/

Be sure to update the docs:

Connecting to Gen AI Models: https://github.com/BCG-X-Official/artkit/blob/1.0.x/sphinx/source/user_guide/introduction_to_artkit/connecting_to_genai_models.ipynb

Please see the tutorial: Creating New Model Classes

bcg-x-official / artkit Goto Github PK

artkit's Introduction

Automated Red Teaming (ART) and testing toolkit

Getting started

Introduction

Key Features

Supported Model Providers

Installation

Pip

Conda

Optional dependencies

Environment variables

Quick Start

Contributing

License

BCG X

artkit's People

Contributors

Stargazers

Watchers

Forkers

artkit's Issues

Issue

Solution

Justification

Details

Details

Justification

Details

Justification

Details

Issue

Solution

Details

Details

Recommend Projects

Recommend Topics

Recommend Org