- LLM App Genie - Get started building generative AI into your apps
LLM App Genie is a fully private chat companion which provides several functionalities, each of them providing a reference implementation of a dialog-based search bot using interchangeable large language models (LLMs).
It combines intelligent search with generative AI to give factual answers based on the private knowledge bases, APIs and SQL databases. The RAG search is based on Amazon OpenSearch Service and Amazon Kendra services. The Agents implementation covers various open source APIs and an SQL query generator. The solution also leverages Streamlit for the frontend and Langchain for the conversation flow to implement an interactive chat application.
The core features of this implementation are:
- Fully private Chatbot with generative AI capabilities on private data
- Flexible browser based webcrawler using scrapy and Playwright to download and ingest webpages from public and private sources into the knowledge base, see full webcrawler feature list
- Support for different knowledge bases (OpenSearch or Amazon Kendra)
- Semantic Search with Amazon Kendra or custom vector embeddings with OpenSearch
- Support for Financial Analysis and SQL database query generator thgough Agents
- Fine-tuning of LLMs to increase model relevance and quality of answers
- Free choice of different LLMs and search engines
- End-to-end automated deployment using AWS CDK
The following screenshot shows the application user interface in RAG mode:
We provide infrastructure as code for the solution in AWS CDK.
There are three main deployment types:
- Fast, minimal, fully managed deployment: Deploy and use AWS Managed services: use Amazon Kendra and Amazon Bedrock.
- Modular deployment: Deploy each component (stack) individually.
- Full deployment: Deploy all components (stacks) of the solution.
You need the following frameworks on your local computer. You can also use the AWS Cloud9 IDE for deployment and testing (choose the Ubuntu OS to have more dependencies already installed).
- Node version 18 or higher
- AWS CDK version
2.91.0
or higher (latest tested CDK version is2.91.0
) - Python 3.10
- Python Poetry
- Docker
If using Cloud9, first increase the disk space to allow CDK to use docker
and pull images without problems:
curl -o resize.sh https://raw.githubusercontent.com/aws-samples/semantic-search-aws-docs/main/cloud9/resize.sh
chmod +x ./resize.sh
./resize.sh 50
To install Node 18 or higher:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
nvm install 18.18.1
To install the latest AWS CDK version:
npm install -g aws-cdk
To install Python Poetry
, or see poetry installation details:
- Linux, macOS, Windows (WSL):
curl -sSL https://install.python-poetry.org | python3 -
- Windows (Powershell)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
To clone the GitHub repository:
git clone https://github.com/aws-samples/llm-app-genie.git
โ Automated deployment is implemented in 06_automation, go to this folder and proceed with other steps:
cd llm-app-genie/06_automation
Change to the infrastructure as code directory and install the dependencies:
poetry install
poetry shell
Login with your AWS credentials. This step is not needed if you have already credentials loaded or if you are started Cloud9 with a user having all the required permissions.
aws configure
In Cloud9, when switching to a role different than the AWS managed default, remember to delete the file ~/.aws/credentials
first, then execute aws configure
.
If required, you can change the used AWS account and region by setting the following env variables,
export AWS_DEFAULT_REGION=<aws_region>
Also, you need to set the Account ID of the account to be used for deployment:
export CDK_DEFAULT_ACCOUNT=<your_account_id>
You can review the CDK Deployment Flow to understand what roles and access rights for each role are being used.
In a nutshell, you can bootstrap CDK (cdk boostrap
) using e.g. credentials with Administrator access, which creates a set of scoped roles (cdk-file-publishing-role, cdk-image-publishing-role, cdk-deploy-role, cdk-exec-role
).
cdk boostrap
You can trigger the deployment through CDK which assumes the file, image publishing and deployment role to initiate the deployment through AWS CloudFormation which then can use the passed cdk-exec-role
IAM role to create the required resources.
Note that the deployment user does not need to have the rights to directly create the resources.
If you want a minimal deployment using fully managed AWS services, you can follow these instructions. You need to deploy:
- A knowledge base (KB) with a search index over ingested documents based on Amazon Kendra.
- A Chatbot front-end which orchestrates the conversation with langchain, using Large Language Models available in Amazon Bedrock.
Step 1: Deploying the knowledge base on Amazon Kendra
To deploy the Kendra index and data sources, follow the instructions in Deploying Amazon Kendra
cdk deploy GenieKendraIndexStack GenieKendraDataSourcesStack --require-approval never
KendraIndexStack
creates an Amazon Kendra Index and a WEBCRAWLERV2
data source pointing at the website you specified in the app config. The default configuration points to the last 10 pages of media releases from the federal website admin.ch.
The stack deployment takes about 30 minutes.
Step 2: Deploy the chatbot based on Amazon Bedrock LLMs
Before to deploy the chatbot you need to decide whether update the Amazon Bedrock region in the deployment config, setup by default to us-west-2
.
You can check more information in the Amazon Bedrock section in 03_chatbot/README.md.
โ When using Amazon Bedrock remember that although the service is now Generally Available, the models need to be activated in the console.
cdk deploy GenieChatBotStack --require-approval never
The deployment should last 10 minutes.
The link to access the chatbot on your website can be found in the CloudFormation Output variables for the stack, in the region used for the deployment.
Since the chatbot is exposed to the public internet, the UI interface is protected by a login form. The credentials are automatically generated and stored in AWS Secret Manager. The Streamlit credentials can be retrieved either by navigating in the console to the AWS Secret Manager, or by using the AWS CLI.
# username
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.username'
# password
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.password'
When connecting to the website, you will see a self-signed certificate error from the browser. You can ignore the error and proceed to the website.
You can also deploy individual components only. Genie will automatically detect which components are available based on resource tags (defined in deployment config) and use them accordingly. Check automation readme for more details.
The Genie components are:
- A large language model (LLM).
- A knowledge base (KB) with a search index over ingested documents. Genie queries the KB and uses returned documents to enhance the LLM prompt and provide document links in the response. Amazon Kendra or Amazon OpenSearch are the knowledge base and provide the search capabilities
- A Chatbot front-end which orchestrates the conversation with langchain.
- Amazon SageMaker Studio domain for experimentation
โ Genie is default application Prefix, if case you change it make sure to modify the commands below
Step 1: Deploying the LLM
The solution deploys by default the Falcon 40b instruct LLM.
cdk deploy GenieLlmPipelineStack --require-approval never
After the deployment is completed, you can navigate to AWS CodePipeline to monitor how the deployment is proceeding (it takes between 10 and 15 minutes).
Step 2: Deploying the knowledge base
To deploy the Amazon OpenSearch index, follow the instructions below.
cdk deploy GenieOpenSearchDomainStack GenieOpenSearchIngestionPipelineStack --require-approval never
GenieOpenSearchDomainStack
deploys an OpenSearch domain with Direct Internet Access, protected by an IAM role.
Similarly you can also deploy GenieOpenSearchIngestionPipelineStack
, which initiates the pipeline that creates a SageMaker real-time endpoint for computing embeddings, and a custom crawler to download the website defined in the buildspec.yml
. It also ingests the documents downloaded by the crawler into the OpenSearch domain.
To deploy the Kendra index and data sources, follow the instructions in Deploying Amazon Kendra
cdk deploy GenieKendraIndexStack GenieKendraDataSourcesStack --require-approval never
KendraIndexStack
creates an Amazon Kendra Index and a WEBCRAWLERV2
data source pointing at the website you specified in the app config.
The stack deployment takes about 30 minutes. The crawling and ingestion pipeline can take longer depending on the size of the website, amount of downloaded documents, and the crawler depth you specified.
Step 3: Deploy the chatbot
The chatbot requires the following configuration:
- the Sagemaker endpoint name for the embeddings
- the SageMaker endpoint name for the LLM
- the endpoint for the OpenSearch domain
- the index of the OpenSearch to query
- the Kendra index ID
[Optional] Use chatbot with Amazon Bedrock:
- Before to deploy the chatbot you need to decide whether update the Amazon Bedrock region in the deployment config, setup by default to
us-west-2
. You can check more information in the Amazon Bedrock section in 03_chatbot/README.md.
โ When using Amazon Bedrock remember that although the service is now Generally Available, the models need to be activated in the console.
These configurations are identified by specific resource tags deployed alongside the resources. The chatbot dynamically detects the available resources based on these tags. If you want to personalize the chatbot icons, you can do so by updating the configuration in the ./03_chatbot/chatbot/appconfig.json file.
cdk deploy GenieChatBotStack --require-approval never
The chatbot UI interface is protected by a login form. The credentials are automatically generated and stored in AWS Secret Manager. The Streamlit credentials can be retrieved either by navigating in the console to the AWS Secret Manager, or by using the AWS CLI.
# username
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.username'
# password
aws secretsmanager get-secret-value --secret-id GenieStreamlitCredentials | jq -r '.SecretString' | jq -r '.password'
By default, we deploy a self-signed certificate to enable encrypted communication between the browser and the chatbot. The default configuration of the self-signed certificate can be found in dev.json:
{
...
"self_signed_certificate": {
"email_address": "[email protected]",
"common_name": "example.com",
"city": ".",
"state": ".",
"country_code": "AT",
"organization": ".",
"organizational_unit": ".",
"validity_seconds": 157680000 # 5 years validity
}
}
To avoid the self-signed certificate error from the browser, we recommend to deploy your own certificate to the chatbot. You can import your own certificate to Amazon Certificate Manager, or generate a new one if you have a domain registered into Route 53 and point it to the Application Load Balancer of the Chatbot.
Step 4: Deploy Sagemaker Studio Domain Amazon SageMaker Studio provides an environment where you experiment in the notebooks with different LLMs, embeddings, and fine-tuning.
The solution provides notebooks for experimentation. For example:
- ./00_llm_endpoint_setup/deploy_embeddings_model_sagemaker_endpoint.ipynb to deploy a SageMaker endpoint to help create document embeddings with HuggingFace's Transformers.
- ./00_llm_endpoint_setup/deploy_falcon-40b-instruct.ipynb to deploy Falcon 40b Foundation Model, either real-time or asynchronous.
cdk deploy SageMakerStudioDomainStack --require-approval never
You can deploy all the components (stacks) of the solution at once.
โ By default, this solution deploys the Falcon 40B LLM model on a ml.g5.12xlarge
compute instance on Amazon SageMaker. Please ensure your account has an appropriate service quotas for this type of instance in the AWS region you want to deploy the solution. Alternatively you can enable your favorite models in Amazon Bedrock, in your preferred region.
The simplest way is to use Launch Stack (to be added), which uses a AWS CloudFormation template to bootstrap an AWS CodeCommit repo from the GitHub repository and triggers a full deployment using AWS CodePipeline.
Alternatively, you can use the next steps to deploy the full solution with CDK.
Step 1: Deploying with CDK
Make sure your current working directory is 06_automation
.
The following command will check for available stack and deploy the whole solution.
CDK will add the Application Prefix to all stack (Genie by default)
cdk ls
cdk deploy --all --require-approval never
The most relevant app configuration parameters are being loaded from the deployment config
We provide an alternative way to deploy the solution by setting up a CI/CD pipeline on your AWS account. We first deploy the infrastructure to for the deployment. We use:
- AWS CodeCommit to host the git repo
- AWS CodeBuild to deploy the full solution via cdk
- AWS CodePipeline to orchestrate the deployment
In the default settings, the pipeline will trigger only for the develop
branch, but this can be changed.
cd <path-to-cloned-repo>/06_automation
cdk deploy GenieDeploymentPipelineStack --require-approval never
You can configure your git to authenticate against AWS CodeCommit using your AWS credentials. See here for more information
cd <path-to-cloned-repo>/
pip install git-remote-codecommit
git remote set-url origin codecommit://Genie-repo
git push
The solution is flexible and will automatically discover the available resources, including Amazon Bedrock models, knowledge bases (Amazon Kendra and Amazon OpenSearch), and available LLM endpoints. This means you can decide which knowledge base you combine with which LLM. If you do not have access to Amazon Bedrock, or if it is not available in the AWS Region of your choice, you need to deploy an LLM on Amazon SageMaker.
The most common scenarios are:
- Amazon Kendra + Large LLM on Amazon Bedrock (Claude v2 100K)
- Amazon Kendra + Large LLM on Amazon SageMaker (Falcon 40B)
- Amazon OpenSearch + Large LLM on Amazon Bedrock (Claude Instant 12K)
- Amazon OpenSearch + Light LLM on Amazon SageMaker (Falcon 7B)
If you want to report a bug please open an issue in this repository with the Default bug issue template. Would you like to request a new feature then please open an issue with the Feature Request template.
In case you have a general question or simply need help please open an issue with the I need Help issue template so that we can get in touch with you.
You can add knowledge (textual content) by ingesting it to the available knowledge bases.
The main options are:
- Add additional data sources to Amazon Kendra and run the ingestion
- Manually add knowledge to Amazon OpenSearch by
- Retrigger the ingestion pipeline by changing the CodeCommit repo created by the
GenieOpenSearchIngestionPipelineStack
.
An example of LLM fine-tuning is provided in 2 steps for the model Falcon 40B, i.e. the actual tuning and the deployment of the tuned model.
The fine-tuning is performed using QLoRA, a technique that quantizes a model to 4 bits while keeping the performance of full-precision. This technique enables models with up to 65 billion parameters on a single GPU and achieves state-of-the-art results on language tasks.
The deployment is done using the Hugging Face Text Generation Inference Container (TGI), which enables high-performance using Tensor Parallelism and dynamic batching.
If you want to dive deeper beyond the default configuration of the chatbot please read the 03_chatbot/README.md.
This solution is going to generate costs on your AWS account, depending on the used modules. The main cost drivers are expected to be the real-time Amazon SageMaker endpoints and the knowledge base (e.g. Amazon OpenSearch Service, Amazon Kendra), as these services will be always up and running.
Amazon SageMaker endpoints can host the LLM for text generation, as well as the embeddings model used in combination with Amazon OpenSearch. Their pricing model is based on instance type, number of instances, and time running (billed per second). The default configuration uses (pricing in USD for the Ireland AWS Region as of September 2023):
- 1 x ml.g5.12xlarge for the LLM ($7.91/hour)
- 1 x ml.g4dn.xlarge for the embeddings ($0.821/hour)
Note that extra cost may apply when using commercial models through the AWS Marketplace (e.g.: AI21 Labs LLM models).
You can delete the Amazon SageMaker endpoints during non-working hours to pause the cost for the LLM running on the Amazon SageMaker endpoint, or use Asynchronous endpoints. For a pay-per-token pricing model use Amazon Bedrock which bills the number of input and output tokens. This means that, if you do not use the application, there is no cost from the LLM.
With regards to the knowledge bases, you can choose between Amazon Kendra and Amazon OpenSearch Service. Amazon Kendra pricing model depends on the edition you choose (Developer or Enterprise). The Developer Edition is limited to a maximum of 10,000 documents, 4,000 queries per day, and 5 data sources. If you need more than that or you are running in production you should use the Enterprise Edition.
Amazon OpenSearch Service pricing is based on instance type, number of instances, time running (billed per second), and EBS storage attached. The default configuration uses a single node cluster with 1 x t3.medium.search instance and 100 GB EBS storage (gp2).
Finally, the application relies on an Amazon ECS task running on AWS Fargate and on an Amazon DynamoDB table. AWS Fargate pricing model is based on requested vCPU, memory, and CPU architecture, and billed per second. The default configuration uses 1 vCPU and 2 GB of memory, and uses Linux/x86_64 architecture. The default solution provisions a DynamoDB Standard table with on-demand capacity. DynamoDB pricing dimensions include read and write request units and storage.
Pricing examples of LLM and knowledge base for four scenarios (prices in USD for Ireland AWS Region as of September 2023):
Amazon Bedrock + Amazon Kendra
- See Amazon Bedrock console for model pricing
- Amazon Kendra Developer Edition: $810/month
- Monthly total = $810 + Amazon Bedrock cost
Work hours Large LLM on Amazon SageMaker + Amazon OpenSearch
- Real-time endpoints, 8 hours/day, 20 days/month = 160 hours/month.
- Endpoint for LLM on 1 x ml.g5.12xlarge: $7.91 x 160 = $1265.6
- Endpoint for embeddings on 1 x ml.g4dn.xlarge: $0.821 x 160 = $131.36
- Amazon OpenSearch Service: t3.medium.search + 100 GB EBS (gp2) = 720 h/month x $0.078/hour + $0.11 GB/hour * 100 GB = $67.16
- Monthly total = $1265.6 + $131.36 + $67.16 = $1464.12
Work hours Large LLM on Amazon SageMaker + Amazon Kendra
- Real-time endpoint based on 1 x ml.g5.12xlarge, 8 hours/day, 20 days/month: $7.91 x 8 x 20 = $1265.6
- Amazon Kendra Developer Edition: $810
- Monthly total = $810 + $1265.6 = $2093.7457
Always-on light LLM on Amazon SageMaker + Amazon Kendra
- Real-time endpoint based on 1 x ml.g5.4xlarge, 24/7 (720 hours/month): $2.27 x 720 = $1634.4
- Amazon Kendra Developer Edition: $810
- Monthly total = $810 + $1634.4 = $2462.5457
Item | Description | Monthly Costs |
---|---|---|
Knowledge Base - Amazon Kendra | Developer Edition (maximum of 10,000 documents, 4,000 queries per day, and 5 data sources) | 810.00 USD |
Knowledge Base - Amazon OpenSearch | 1 x ml.g4dn.xlarge for embeddings pluc 1 x t3.medium.search instance with 100 GB EBS storage (gp2) | 198.52 USD |
Full LLM (Falcon 40B) | ml.g5.12xlarge (CPU:48, 192 GiB, GPU: 4), 8 hours/day x 20 days x 7.09 USD/hour |
1,134.40 USD |
Light LLM (Falcon 7B) | ml.g5.4xlarge (CPU:16, 64 GiB, GPU: 1), 8 hours/day x 20 days x 2.03 USD/hour |
324.80 USD |
To clean up the resources, first you need to delete the SageMaker endpoints created by the two AWS CodePipeline pipelines since they are not managed by CDK.
aws cloudformation delete-stack --stack-name GenieLLMSageMakerDeployment
aws cloudformation delete-stack --stack-name GenieEmbeddingsSageMakerDeployment
Then, you can remove the stacks created by CDK
cdk destroy --all
Below you can see the repository structure. We use different environments for each component. You should follow the local development guide, if you want to provide a pull request.
- How to setup the development environment for the chatbot?
Follow the chatbot Readme. - How to setup the development environment for the automation project?
Follow the Full Deployment section.
We appreciate your collaboration because it is key to success and synergies. However, we want to make sure that the contributions can be maintained in the future, thus create an issue with the proposed improvements and get feedback before you put in the effort to implement the change.
If you want to contribute a bug fix please use the Default pull request template.
- 00_llm_endpoint_setup
- Embedding endpoint setup
- LLM endpoint setup
- 01_crawler
- Web crawler which downloads content from a public or private web page recursively using
playwright
and Mozilla'sreadability.js
plugin. For more details see the README
- Web crawler which downloads content from a public or private web page recursively using
- 02_ingestion
- Split and Ingestion of the downloaded webpage paragraphs into a vector store (OpenSearch) using semantic embeddings.
- 03_chatbot
- Chatbot application based on Streamlit and Langchain
- 04_finetuning
- LLM fine-tuning pipelines
- 05_doc
- Solution documentation
- 06_automation
- Infrastructure as code (CDK)
In preparation
We use Trunk for security scans, code quality, and formatting. If you plan to contribute to this repository please install Trunk.
Step 1: Install Trunk
To use trunk
locally, run:
If you are on MacOS run:
brew install trunk-io
or you are on a different OS or not using Homebrew run:
curl https://get.trunk.io -fsSL | bash
For other installation options and details on exactly Trunk install or how to uninstall, see the Install Trunk doc.
Step 2: Initialize Trunk in a git repo
From the root of a git repo, run:
trunk init
See also https://github.com/trunk-io/ for additional information on trunk.
In preparation
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0