LLM App Genie - Get started building generative AI into your apps

LLM App Genie - Get started building generative AI into your apps

Intro and Architecture

LLM App Genie is a fully private chat companion which provides several functionalities, each of them providing a reference implementation of a dialog-based search bot using interchangeable large language models (LLMs).

It combines intelligent search with generative AI to give factual answers based on the private knowledge bases, APIs and SQL databases. The RAG search is based on Amazon OpenSearch Service and Amazon Kendra services. The Agents implementation covers various open source APIs and an SQL query generator. The solution also leverages Streamlit for the frontend and Langchain for the conversation flow to implement an interactive chat application.

The core features of this implementation are:

Fully private Chatbot with generative AI capabilities on private data
Flexible browser based webcrawler using scrapy and Playwright to download and ingest webpages from public and private sources into the knowledge base, see full webcrawler feature list
Support for different knowledge bases (OpenSearch or Amazon Kendra)
Semantic Search with Amazon Kendra or custom vector embeddings with OpenSearch
Support for Financial Analysis and SQL database query generator thgough Agents
Fine-tuning of LLMs to increase model relevance and quality of answers
Free choice of different LLMs and search engines
End-to-end automated deployment using AWS CDK

The following screenshot shows the application user interface in RAG mode:

💡 Note: this solution will incur AWS costs. You can find more information about these costs in the Costs section.

Architecture Overview

Getting started

We provide infrastructure as code for the solution in AWS CDK.

There are three main deployment types:

Fast, minimal, fully managed deployment: Deploy and use AWS Managed services: use Amazon Kendra and Amazon Bedrock.
Modular deployment: Deploy each component (stack) individually.
Full deployment: Deploy all components (stacks) of the solution.

Step 0: Pre-requisites

You need the following frameworks on your local computer. You can also use the AWS Cloud9 IDE for deployment and testing (choose the Ubuntu OS to have more dependencies already installed).

Node version 18 or higher
AWS CDK version 2.91.0 or higher (latest tested CDK version is 2.91.0)
Python 3.10
Python Poetry
Docker

If using Cloud9, first increase the disk space to allow CDK to use docker and pull images without problems:

curl -o resize.sh https://raw.githubusercontent.com/aws-samples/semantic-search-aws-docs/main/cloud9/resize.sh
chmod +x ./resize.sh
./resize.sh 50

To install Node 18 or higher:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
nvm install 18.18.1

To install the latest AWS CDK version:

npm install -g aws-cdk

To install Python Poetry, or see poetry installation details:

Linux, macOS, Windows (WSL):

curl -sSL https://install.python-poetry.org | python3 -

Windows (Powershell)

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

To clone the GitHub repository:

git clone https://github.com/aws-samples/llm-app-genie.git

❗ Automated deployment is implemented in 06_automation, go to this folder and proceed with other steps:

cd llm-app-genie/06_automation

Change to the infrastructure as code directory and install the dependencies:

poetry install
poetry shell

Login with your AWS credentials. This step is not needed if you have already credentials loaded or if you are started Cloud9 with a user having all the required permissions.

aws configure

In Cloud9, when switching to a role different than the AWS managed default, remember to delete the file ~/.aws/credentials first, then execute aws configure.

If required, you can change the used AWS account and region by setting the following env variables,

export AWS_DEFAULT_REGION=<aws_region>

Also, you need to set the Account ID of the account to be used for deployment:

export CDK_DEFAULT_ACCOUNT=<your_account_id>

You can review the CDK Deployment Flow to understand what roles and access rights for each role are being used. In a nutshell, you can bootstrap CDK (cdk boostrap) using e.g. credentials with Administrator access, which creates a set of scoped roles (cdk-file-publishing-role, cdk-image-publishing-role, cdk-deploy-role, cdk-exec-role).

cdk boostrap

You can trigger the deployment through CDK which assumes the file, image publishing and deployment role to initiate the deployment through AWS CloudFormation which then can use the passed cdk-exec-role IAM role to create the required resources.

Note that the deployment user does not need to have the rights to directly create the resources.

Fast, minimal, fully managed Deployment

If you want a minimal deployment using fully managed AWS services, you can follow these instructions. You need to deploy:

A knowledge base (KB) with a search index over ingested documents based on Amazon Kendra.
A Chatbot front-end which orchestrates the conversation with langchain, using Large Language Models available in Amazon Bedrock.

Step 1: Deploying the knowledge base on Amazon Kendra

To deploy the Kendra index and data sources, follow the instructions in Deploying Amazon Kendra

cdk deploy GenieKendraIndexStack GenieKendraDataSourcesStack --require-approval never

KendraIndexStack creates an Amazon Kendra Index and a WEBCRAWLERV2 data source pointing at the website you specified in the app config. The default configuration points to the last 10 pages of media releases from the federal website admin.ch.

The stack deployment takes about 30 minutes.

Step 2: Deploy the chatbot based on Amazon Bedrock LLMs

Before to deploy the chatbot you need to decide whether update the Amazon Bedrock region in the deployment config, setup by default to us-west-2. You can check more information in the Amazon Bedrock section in 03_chatbot/README.md.

❗ When using Amazon Bedrock remember that although the service is now Generally Available, the models need to be activated in the console.

cdk deploy GenieChatBotStack --require-approval never

The deployment should last 10 minutes.

The link to access the chatbot on your website can be found in the CloudFormation Output variables for the stack, in the region used for the deployment.

Since the chatbot is exposed to the public internet, the UI interface is protected by a login form. The credentials are automatically generated and stored in AWS Secret Manager. The Streamlit credentials can be retrieved either by navigating in the console to the AWS Secret Manager, or by using the AWS CLI.

# username
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.username'
# password
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.password'

When connecting to the website, you will see a self-signed certificate error from the browser. You can ignore the error and proceed to the website.

Fully modular Deployment: Deploying individual components of choice

You can also deploy individual components only. Genie will automatically detect which components are available based on resource tags (defined in deployment config) and use them accordingly. Check automation readme for more details.

The Genie components are:

A large language model (LLM).
A knowledge base (KB) with a search index over ingested documents. Genie queries the KB and uses returned documents to enhance the LLM prompt and provide document links in the response. Amazon Kendra or Amazon OpenSearch are the knowledge base and provide the search capabilities
A Chatbot front-end which orchestrates the conversation with langchain.
Amazon SageMaker Studio domain for experimentation

❗ Genie is default application Prefix, if case you change it make sure to modify the commands below

Step 1: Deploying the LLM
The solution deploys by default the Falcon 40b instruct LLM.

cdk deploy GenieLlmPipelineStack --require-approval never

After the deployment is completed, you can navigate to AWS CodePipeline to monitor how the deployment is proceeding (it takes between 10 and 15 minutes).

Step 2: Deploying the knowledge base

To deploy the Amazon OpenSearch index, follow the instructions below.

cdk deploy GenieOpenSearchDomainStack GenieOpenSearchIngestionPipelineStack --require-approval never

GenieOpenSearchDomainStack deploys an OpenSearch domain with Direct Internet Access, protected by an IAM role. Similarly you can also deploy GenieOpenSearchIngestionPipelineStack, which initiates the pipeline that creates a SageMaker real-time endpoint for computing embeddings, and a custom crawler to download the website defined in the buildspec.yml. It also ingests the documents downloaded by the crawler into the OpenSearch domain.

To deploy the Kendra index and data sources, follow the instructions in Deploying Amazon Kendra

cdk deploy GenieKendraIndexStack GenieKendraDataSourcesStack --require-approval never

KendraIndexStack creates an Amazon Kendra Index and a WEBCRAWLERV2 data source pointing at the website you specified in the app config. The stack deployment takes about 30 minutes. The crawling and ingestion pipeline can take longer depending on the size of the website, amount of downloaded documents, and the crawler depth you specified.

Step 3: Deploy the chatbot

The chatbot requires the following configuration:

the Sagemaker endpoint name for the embeddings
the SageMaker endpoint name for the LLM
the endpoint for the OpenSearch domain
the index of the OpenSearch to query
the Kendra index ID

[Optional] Use chatbot with Amazon Bedrock:

Before to deploy the chatbot you need to decide whether update the Amazon Bedrock region in the deployment config, setup by default to us-west-2. You can check more information in the Amazon Bedrock section in 03_chatbot/README.md.

❗ When using Amazon Bedrock remember that although the service is now Generally Available, the models need to be activated in the console.

These configurations are identified by specific resource tags deployed alongside the resources. The chatbot dynamically detects the available resources based on these tags. If you want to personalize the chatbot icons, you can do so by updating the configuration in the ./03_chatbot/chatbot/appconfig.json file.

cdk deploy GenieChatBotStack --require-approval never

The chatbot UI interface is protected by a login form. The credentials are automatically generated and stored in AWS Secret Manager. The Streamlit credentials can be retrieved either by navigating in the console to the AWS Secret Manager, or by using the AWS CLI.

# username
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.username'
# password
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.password'

By default, we deploy a self-signed certificate to enable encrypted communication between the browser and the chatbot. The default configuration of the self-signed certificate can be found in dev.json:

{
  ...
  "self_signed_certificate": {
    "email_address": "[email protected]",
    "common_name": "example.com",
    "city": ".",
    "state": ".",
    "country_code": "AT",
    "organization": ".",
    "organizational_unit": ".",
    "validity_seconds": 157680000 # 5 years validity
  }
}

To avoid the self-signed certificate error from the browser, we recommend to deploy your own certificate to the chatbot. You can import your own certificate to Amazon Certificate Manager, or generate a new one if you have a domain registered into Route 53 and point it to the Application Load Balancer of the Chatbot.

Step 4: Deploy Sagemaker Studio Domain Amazon SageMaker Studio provides an environment where you experiment in the notebooks with different LLMs, embeddings, and fine-tuning.

The solution provides notebooks for experimentation. For example:

./00_llm_endpoint_setup/deploy_embeddings_model_sagemaker_endpoint.ipynb to deploy a SageMaker endpoint to help create document embeddings with HuggingFace's Transformers.
./00_llm_endpoint_setup/deploy_falcon-40b-instruct.ipynb to deploy Falcon 40b Foundation Model, either real-time or asynchronous.

cdk deploy SageMakerStudioDomainStack --require-approval never

Full deployment

You can deploy all the components (stacks) of the solution at once.

❗ By default, this solution deploys the Falcon 40B LLM model on a ml.g5.12xlarge compute instance on Amazon SageMaker. Please ensure your account has an appropriate service quotas for this type of instance in the AWS region you want to deploy the solution. Alternatively you can enable your favorite models in Amazon Bedrock, in your preferred region.

The simplest way is to use Launch Stack (to be added), which uses a AWS CloudFormation template to bootstrap an AWS CodeCommit repo from the GitHub repository and triggers a full deployment using AWS CodePipeline.

Alternatively, you can use the next steps to deploy the full solution with CDK.

Step 1: Deploying with CDK

Make sure your current working directory is 06_automation. The following command will check for available stack and deploy the whole solution. CDK will add the Application Prefix to all stack (Genie by default)

cdk ls
cdk deploy --all --require-approval never

The most relevant app configuration parameters are being loaded from the deployment config

CI/CD Deployment

We provide an alternative way to deploy the solution by setting up a CI/CD pipeline on your AWS account. We first deploy the infrastructure to for the deployment. We use:

AWS CodeCommit to host the git repo
AWS CodeBuild to deploy the full solution via cdk
AWS CodePipeline to orchestrate the deployment

In the default settings, the pipeline will trigger only for the develop branch, but this can be changed.

cd <path-to-cloned-repo>/06_automation
cdk deploy GenieDeploymentPipelineStack --require-approval never

You can configure your git to authenticate against AWS CodeCommit using your AWS credentials. See here for more information

cd <path-to-cloned-repo>/
pip install git-remote-codecommit
git remote set-url origin codecommit://Genie-repo
git push

Common Deployment Scenarios

The solution is flexible and will automatically discover the available resources, including Amazon Bedrock models, knowledge bases (Amazon Kendra and Amazon OpenSearch), and available LLM endpoints. This means you can decide which knowledge base you combine with which LLM. If you do not have access to Amazon Bedrock, or if it is not available in the AWS Region of your choice, you need to deploy an LLM on Amazon SageMaker.

The most common scenarios are:

Amazon Kendra + Large LLM on Amazon Bedrock (Claude v2 100K)
Amazon Kendra + Large LLM on Amazon SageMaker (Falcon 40B)
Amazon OpenSearch + Large LLM on Amazon Bedrock (Claude Instant 12K)
Amazon OpenSearch + Light LLM on Amazon SageMaker (Falcon 7B)

I need help

If you want to report a bug please open an issue in this repository with the Default bug issue template. Would you like to request a new feature then please open an issue with the Feature Request template.

In case you have a general question or simply need help please open an issue with the I need Help issue template so that we can get in touch with you.

Usage Scenarios

How to populate knowledge bases ?

You can add knowledge (textual content) by ingesting it to the available knowledge bases.

The main options are:

Add additional data sources to Amazon Kendra and run the ingestion
Manually add knowledge to Amazon OpenSearch by

Retrigger the ingestion pipeline by changing the CodeCommit repo created by the GenieOpenSearchIngestionPipelineStack.

How to fine-tune a LLM?

An example of LLM fine-tuning is provided in 2 steps for the model Falcon 40B, i.e. the actual tuning and the deployment of the tuned model.

The fine-tuning is performed using QLoRA, a technique that quantizes a model to 4 bits while keeping the performance of full-precision. This technique enables models with up to 65 billion parameters on a single GPU and achieves state-of-the-art results on language tasks.

The deployment is done using the Hugging Face Text Generation Inference Container (TGI), which enables high-performance using Tensor Parallelism and dynamic batching.

How to customize the chatbot?

If you want to dive deeper beyond the default configuration of the chatbot please read the 03_chatbot/README.md.

Costs and Clean up

Costs

This solution is going to generate costs on your AWS account, depending on the used modules. The main cost drivers are expected to be the real-time Amazon SageMaker endpoints and the knowledge base (e.g. Amazon OpenSearch Service, Amazon Kendra), as these services will be always up and running.

Amazon SageMaker endpoints can host the LLM for text generation, as well as the embeddings model used in combination with Amazon OpenSearch. Their pricing model is based on instance type, number of instances, and time running (billed per second). The default configuration uses (pricing in USD for the Ireland AWS Region as of September 2023):

1 x ml.g5.12xlarge for the LLM ($7.91/hour)
1 x ml.g4dn.xlarge for the embeddings ($0.821/hour)

Note that extra cost may apply when using commercial models through the AWS Marketplace (e.g.: AI21 Labs LLM models).

You can delete the Amazon SageMaker endpoints during non-working hours to pause the cost for the LLM running on the Amazon SageMaker endpoint, or use Asynchronous endpoints. For a pay-per-token pricing model use Amazon Bedrock which bills the number of input and output tokens. This means that, if you do not use the application, there is no cost from the LLM.

With regards to the knowledge bases, you can choose between Amazon Kendra and Amazon OpenSearch Service. Amazon Kendra pricing model depends on the edition you choose (Developer or Enterprise). The Developer Edition is limited to a maximum of 10,000 documents, 4,000 queries per day, and 5 data sources. If you need more than that or you are running in production you should use the Enterprise Edition.

Amazon OpenSearch Service pricing is based on instance type, number of instances, time running (billed per second), and EBS storage attached. The default configuration uses a single node cluster with 1 x t3.medium.search instance and 100 GB EBS storage (gp2).

Finally, the application relies on an Amazon ECS task running on AWS Fargate and on an Amazon DynamoDB table. AWS Fargate pricing model is based on requested vCPU, memory, and CPU architecture, and billed per second. The default configuration uses 1 vCPU and 2 GB of memory, and uses Linux/x86_64 architecture. The default solution provisions a DynamoDB Standard table with on-demand capacity. DynamoDB pricing dimensions include read and write request units and storage.

Pricing examples of LLM and knowledge base for four scenarios (prices in USD for Ireland AWS Region as of September 2023):

Amazon Bedrock + Amazon Kendra

See Amazon Bedrock console for model pricing
Amazon Kendra Developer Edition: $810/month
Monthly total = $810 + Amazon Bedrock cost

Work hours Large LLM on Amazon SageMaker + Amazon OpenSearch

Real-time endpoints, 8 hours/day, 20 days/month = 160 hours/month.
Endpoint for LLM on 1 x ml.g5.12xlarge: $7.91 x 160 = $1265.6
Endpoint for embeddings on 1 x ml.g4dn.xlarge: $0.821 x 160 = $131.36
Amazon OpenSearch Service: t3.medium.search + 100 GB EBS (gp2) = 720 h/month x $0.078/hour + $0.11 GB/hour * 100 GB = $67.16
Monthly total = $1265.6 + $131.36 + $67.16 = $1464.12

Work hours Large LLM on Amazon SageMaker + Amazon Kendra

Real-time endpoint based on 1 x ml.g5.12xlarge, 8 hours/day, 20 days/month: $7.91 x 8 x 20 = $1265.6
Amazon Kendra Developer Edition: $810
Monthly total = $810 + $1265.6 = $2093.7457

Always-on light LLM on Amazon SageMaker + Amazon Kendra

Real-time endpoint based on 1 x ml.g5.4xlarge, 24/7 (720 hours/month): $2.27 x 720 = $1634.4
Amazon Kendra Developer Edition: $810
Monthly total = $810 + $1634.4 = $2462.5457

Item	Description	Monthly Costs
Knowledge Base - Amazon Kendra	Developer Edition (maximum of 10,000 documents, 4,000 queries per day, and 5 data sources)	810.00 USD
Knowledge Base - Amazon OpenSearch	1 x ml.g4dn.xlarge for embeddings pluc 1 x t3.medium.search instance with 100 GB EBS storage (gp2)	198.52 USD
Full LLM (Falcon 40B)	ml.g5.12xlarge (CPU:48, 192 GiB, GPU: 4), 8 hours/day x 20 days x 7.09 USD/hour	1,134.40 USD
Light LLM (Falcon 7B)	ml.g5.4xlarge (CPU:16, 64 GiB, GPU: 1), 8 hours/day x 20 days x 2.03 USD/hour	324.80 USD

Clean up

To clean up the resources, first you need to delete the SageMaker endpoints created by the two AWS CodePipeline pipelines since they are not managed by CDK.

aws cloudformation delete-stack --stack-name GenieLLMSageMakerDeployment
aws cloudformation delete-stack --stack-name GenieEmbeddingsSageMakerDeployment

Then, you can remove the stacks created by CDK

cdk destroy --all

Setup development environment

Below you can see the repository structure. We use different environments for each component. You should follow the local development guide, if you want to provide a pull request.

How to setup the development environment for the chatbot?
Follow the chatbot Readme.
How to setup the development environment for the automation project?
Follow the Full Deployment section.

Pull Requests

We appreciate your collaboration because it is key to success and synergies. However, we want to make sure that the contributions can be maintained in the future, thus create an issue with the proposed improvements and get feedback before you put in the effort to implement the change.

If you want to contribute a bug fix please use the Default pull request template.

Repository structure

00_llm_endpoint_setup
- Embedding endpoint setup
- LLM endpoint setup
01_crawler
- Web crawler which downloads content from a public or private web page recursively using playwright and Mozilla's readability.js plugin. For more details see the README
02_ingestion
- Split and Ingestion of the downloaded webpage paragraphs into a vector store (OpenSearch) using semantic embeddings.
03_chatbot
- Chatbot application based on Streamlit and Langchain
04_finetuning
- LLM fine-tuning pipelines
05_doc
- Solution documentation
06_automation
- Infrastructure as code (CDK)

Local Development Guide

In preparation

Pre-requisites for Development

We use Trunk for security scans, code quality, and formatting. If you plan to contribute to this repository please install Trunk.

Step 1: Install Trunk

To use trunk locally, run:

If you are on MacOS run:

brew install trunk-io

or you are on a different OS or not using Homebrew run:

curl https://get.trunk.io -fsSL | bash

For other installation options and details on exactly Trunk install or how to uninstall, see the Install Trunk doc.

Step 2: Initialize Trunk in a git repo

From the root of a git repo, run:

trunk init

See also https://github.com/trunk-io/ for additional information on trunk.

Troubleshooting/FAQ