âšī¸ You will run this lab in your own AWS account. Please follow directions at the end of the lab to remove resources to avoid future costs.
This lab was derived directly from one of Operataional Excellence Labs named Automating operations with Playbooks and Runbooks in AWS Well-Architected Lab.
Manually running your runbooks and playbooks for operational activities has a number of drawbacks:
- Activities are prone to errors & difficult to trace.
- Manual activities do not allow your operational practice to scale in line with your business requirements.
In contrast, implementing automation in these activities has the following benefits:
- Improved reliability by preventing the introduction of errors through manual processes.
- Increased scalability by allowing non linear resource investment to operate your workload.
- Increased traceability on your operation through log collection of the automation activity.
- Improved incident response by reducing idle time and automatically triggering activity based on known events.
Click here if you would like to know what runbook and playbook are
At a glance, both runbooks and playbooks appear to be similar documents that technical users, can use to perform operational activities. However, there an essential difference between them:
-
A playbook documents contain processes that guides you through activities to investigate an issue. For example, gathering applicable information, identifying potential sources of failure, isolating faults, or determining the root cause of issues. Playbooks can follow multiple paths and yield more than one outcome.
-
A runbook contains procedures necessary to achieve a specific outcome. For example, creating a user, rolling back configuration, or scaling resource to resolve the issue identified.
This hands-on lab will guide you through the steps to automate your operational activities using runbooks and playbooks built with AWS tools.
We will show how you can build automated runbooks and playbooks to investigate and remediate application issues using the following AWS services:
- An AWS account that you are able to use for testing. The account should not be used for production purposes.
- An IAM user in your AWS account with full access to CloudFormation, Amazon ECS,Amazon RDS, Amazon Virtual Private Cloud (VPC), AWS Identity and Access Management (IAM), AWS Cloud9
NOTE: You will be billed for any applicable AWS resources used if you complete this lab that are not covered in the AWS Free Tier.
This lab walks you through creating a CI/CD workflow for serveress applications.
- Step 1. Deploy the sample application environment
- Step 2. Simulate an Application Issue
- Step 3. Build and Run an Investigative Playbook
- Step 4. Build and Run Remediation Runbook
- Teardown
- Summary
In this section, you will prepare a sample application. The application is an API hosted inside a docker container, using Amazon Elastic Compute Service (ECS).. The container is accessed via an Application Load Balancer.
The API is a private microservice within your Amazon Virtual Private Cloud (VPC). Communication to the API can only be done privately through routes within the VPC subnet. In our lab example, the business owner has agreed to run the API over HTTP protocol to simplify the implementation.
The API has two actions available which encrypt and decrypt information. This is triggered by doing a REST POST call to the /encrypt / /decrypt methods as appropriate.
- The encrypt action will allow you to pass a secret message along with a 'Name' key as the identifier and it will return a 'Secret Key Id' that you can use later to decrypt your message.
- The decrypt action allows you to then decrypt the secret message passing along the 'Name' key and 'Secret Key Id' you obtained before to get your secret message.
Both actions will make a write and read call to the application database hosted in Amazon Relation Database Service (RDS), where the encrypted messages are stored.
The following step-by-step instructions will provision the application that you will use with your runbooks and playbooks .
Explore the contents of the CloudFormation script to learn more about the environment and application.
You will use this sample application as a sandbox to simulate an application performance issue, start your runbooks and playbooks to autonomously investigate and remediate.
- You will prepare the Cloud9 workspace launched with a new VPC.
- You will run the application build script from the Cloud9 console to build the sample application as shown in the diagram below.
In this first step you will provision a CloudFormation stack that builds a Cloud9 workspace along with the VPC for the sample application. This Cloud9 workspace will be used to run the provisioning script of the sample application. You can choose the to deploy stack in one of the regions below.
-
Click on the link below to deploy the stack. This will take you to the CloudFormation console in your account. Use
walab-ops-base-resources
as the stack name, and take the default values for all options. -
Once the template is deployed, wait until the CloudFormation Stack reaches the CREATE_COMPLETE state.
Next, run the build script to build and deploy you application environment from the Cloud9 workspace as follows:
-
From the main console, access the Cloud9 service.
-
Click Environments section on the left menu, and locate an environment named
WellArchitectedOps-walab-ops-base-resources
as below, then click Open. -
Your environment will bootstrap the lab repository. You should see a terminal output showing the following output:
When the bootstrap script finishes you will see a folder called
aws-well-architected-labs
. -
In the IDE terminal console, change directory to the working folder where the build script is located:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/scripts/
-
Copy and paste the command below, replacing
[email protected]
and[email protected]
with the email address you would like the application to notify you with. Replace the[email protected]
value with email representing system operators team and[email protected]
with email address representing business owner.bash build_application.sh walab-ops-base-resources [email protected] [email protected]
The
build_application.sh
script will build and deploy your sample application, along with the architecture that hosts it. The application architecture will have capabilities to notify systems operators and owners, leveraging Amazon Simple Notification Service. You can use the same email address for[email protected]
and[email protected]
if you need to, but ensure that you have both values specified.
If you have deployed Amazon ECS before in your account, you may encounter InvalidInput error with message "AWSServiceRoleForECS has been taken" while running the build_application.sh script. You can safely ignore this message, as the script will continue despite the error.
-
The above command runs the build and provisioning of the application stack. The script should take about 20 mins to finish.
The
build_application.sh
will deploy the application docker image and push it to Amazon ECR. This is used by Amazon ECS. Once the build script completes, another CloudFormation stack containing the application resources (ECS, RDS, ALB, and others) will be deployed.
-
In the CloudFormation console, you should see a new stack being deployed called
walab-ops-sample-application
. Wait until the stack reaches CREATE_COMPLETE state and proceed to the next step.
Once the application is successfully deployed, go to your CloudFormation console and locate the stack named walab-ops-sample-application
.
-
Confirm that the stack is in a 'CREATE_COMPLETE' state.
-
Record the following output details as it will be required later:
-
Take note of the DNS value specified under OutputApplicationEndpoint of the Outputs.
The screenshot below shows the output from the CloudFormation stack:
-
Check for an email sent to the system operator and owner addresses you've specified in the build_application.sh script. This email should also be visible in the CloudFormation parameter under in the SystemOpsNotificationEmail and SystemOwnerNotificationEmail.
-
Click
confirm subscription
on the email links to subscribe.
There will be 2 emails sent to your address, please ensure to subscribe to both of them.
In this section, you will be testing the encrypt API action from the deployed application.
The application will take a JSON payload with Name
as the identifier and Text
key as the value of the secret message.
The application will encrypt the value under Text
key with a designated KMS key and store the encrypted text in the RDS database with Name
as the primary key.
Note: For simplicity purposes the sample application will re-use the same KMS keys for each record generated.
Click here to test
-
In the Cloud9 terminal, run the command below, replacing the
ApplicationEndpoint
with the OutputApplicationEndpoint from previous step. This command will run curl to send a POST request with the secret message payload{"Name":"Bob","Text":"Run your operations as code"}
to the API.ALBEndpoint="ApplicationEndpoint"
curl --header "Content-Type: application/json" --request POST --data '{"Name":"Bob","Text":"Run your operations as code"}' $ALBEndpoint/encrypt
-
Once you run this command, you should see output as follows:
{"Message":"Data encrypted and stored, keep your key save","Key":"EncryptKey"}
-
Take note of the encrypt key value under Key .
-
Run the command below, pasting the encrypt key you took note of previously under the Key section to test the decrypt API.
curl --header "Content-Type: application/json" --request GET --data '{"Name":"Bob","Key":"EncryptKey"}' $ALBEndpoint/decrypt
-
Once you run the command you should see the following output:
{"Text":"Run your operations as code"}
You have now completed the first section of the Lab.
You should have a sample application API which we will use for the remainder of the lab.
Understanding the health of your workload is an essential component of Operational Excellence. Defining metrics and thresholds, together with appropriate alerts will ensure that issues can be acknowledged and remediated within an appropriate timeframe.
In this section of the lab, you will simulate a performance issue within the API. Using Amazon CloudWatch synthetic, your API will utilize a canary monitor, which continuously checks API response time to detect an issue.
In this example, should the API take longer than 6 seconds to respond, an alert will be created, triggering a notification email.
- You will run a script that will send a large amount of traffic to the API.
- You will observe and confirm the issue through AWS monitoring tools.
The following resources had been deployed to perform these actions.
In this section, you will send multiple concurrent requests to the application, simulating a large surge of incoming traffic. This will overwhelm the API, which will gradually increase the response time of the application. This results in the canary monitoring exceeding the set threshold, triggering the CloudWatch Alarm to send notification.
Follow below steps to continue:
-
From the Cloud9 terminal, run the command shown below to change directory to the working script folder:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/scripts/
-
Confirm that you have the
test.json
in the folder and it contains the following text:{"Name":"Test User","Text":"This Message is a Test!"}
-
Go to CloudFormation console and take note of the OutputApplicationEndpoint value under Output tab of
walab-ops-sample-application
stack. This is the DNS endpoint of the Application Load Balancer. -
Make sure you have test the application previously. If so, execute the command below:
bash simulate_request.sh $ALBEndpoint
This script uses the Apache Benchmark to send 60,000,000 requests, 3000 concurrent request at a time.
When you run the command you will see the output gradually change from a consistently successful 200 response to include 504 time-out responses.
The requests generated by the script are overwhelming the application API and result in occasional timeouts by your load balancer.
Keep the command running in the background as you proceed through the lab.
-
After approximately 6 minutes, you will see an alarm which is triggered as a response to the generated activity. This will trigger an email indicating that the CloudWatch alarm has been triggered.
-
Check and confirm the alarm by going to the CloudWatch console.
-
Click on the Alarms section on the left menu.
-
Click on the Alarms called
mysecretword-canary-duration-alarm
, which should be in an alarm state. -
Click on the alarm to display the CloudWatch metrics that the alarm data is based from.
-
The alarm is based on the
Duration
metric data emitted by themysecretword-canary
CloudWatch synthetic canary monitor. The Duration metric measures how long it takes for the canary requests to receive a response from the application. -
The alarm is triggered whenever the value of the
Duration
metric is above 6 seconds within a 1 minute duration. The latest threshold will be 5000 for 3 datapoints within 6 minutes. -
On the left menu click on **Application monitoring and Synthetics Canaries and locate the canary monitor named
mysecretword-canary
. -
Click on the canary and the select the Configuration tab.
-
From here you will see the canary configuration and a snippet of the canary script.
-
In the canary script section, scroll down to the section that contains
let requestOptionStep1
as shown in the screenshot below. This is the configuration that controls the destination of the request (hostname, path and payload body). -
Click on the Monitoring tab.
-
From here you will see the visualization of the metrics that the canary monitor generates.
-
Locate the 'Duration' metric that is being used to trigger the CloudWatch alarm.
-
You will see the average duration value of the canary request representing the time to complete. A value above 6000ms signifies that the request has taken more than 6 seconds to receive a response from the application, indicating a performance issue in the API.
You have now completed the second section of the lab.
You should still have the simulate_request.sh
running in the background, simulating a large influx of traffic to your API. This causes the application to respond slowly and time-out periodically. The CloudWatch Alarm will be triggering and performance issue notifications sent to your System Operator to prompt them into action.
This concludes Section 2 of this lab. Click 'Next step' to continue to the next section of the lab where we will build an automated playbook to assist investigation of the issue.
The efficiency of issue resolution within an Operations team is directly linked to their tenure and experience. Where an Operator has prior knowledge of a particular issue, they will have a headstart in being able to reach resolution in terms of understanding logs and metrics which were used in previous situations. Whilst this constitutes value to an Operations group, it also represents a single point of failure and a scalability challenge.
This is where playbooks become important. Playbooks are a documented set of predefined steps, which are run to identify an issue. The result of each step can be used to either call more steps to run, or alternatively to trigger manual intervention.
Automating playbook activities wherever possible, is critical to reducing the time to respond to an incident.
The AWS Cloud offers multiple services you can use to build an automated playbook, one which is AWS Systems Manager.
AWS Systems Manager offers an automation document capability (known within Systems Manager as runbooks), which allows for the creation of a series of executable steps to orchestrate your investigation and remediation. AWS Systems Manager Automation Documents allow a user to run custom scripts, call AWS service APIs, or even run remote commands on cloud or on-premise compute instances.
In this section, you will focus on creating an automated playbook in assisting your investigation, as a Systems Operator.
- You will build a playbook to gather information about the workload and query the relevant metrics and logs.
- You will run the automation document to investigate your issue.
The Systems Manager Automation Document you are building will require assumed permissions to run the investigation and remediation steps. You will need to create the IAM role that will assume the permissions to perform the playbook activities. To simplify the deployment process, a CloudFormation template has been provided that you can deploy via the console or AWS CLI. Please choose one of the two following deployment steps:
Click here for CloudFormation Console deployment step
Click here for CloudFormation CLI deployment step (Preferred way)
Note: To deploy from the command line, ensure that you have installed and configured AWS CLI with the appropriate credentials.
- From the Cloud9 terminal change to the appropriate folder as shown:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
- Then run the command listed below:
aws cloudformation create-stack --stack-name waopslab-automation-role \
--capabilities CAPABILITY_NAMED_IAM \
--template-body file://automation_role.yml
- Confirm that the stack has installed correctly. You can do this by running the describe-stacks command:
aws cloudformation describe-stacks --stack-name waopslab-automation-role
Locate the StackStatus and confirm it is set to CREATE_COMPLETE
-
Once you have deployed the CloudFormation stack above, go to the IAM Console.
-
On the side menu, click on Roles and locate the IAM role named AutomationRole.
-
Take note of the ARN of the role, as we will need it later in the lab.
In preparation for the investigation, you need to know all services and resources associated to the issue. When the email notification is sent, information in the email does not contain any resources information. To gather this necessary information, we will build a playbook to acquire all related resources using our CloudWatch alarm ARN as a reference.
Codifying your playbook with AWS Systems Manager allows for maximum code reusability. This will reduce overhead in re-writing codes that has identical objectives.
Note: Follow these step to build and run playbook. Select a guide to deploy using either the AWS console, the AWS CLI or via a CloudFormation template deployment.
Click here for CloudFormation Console deployment step
Download the template here.
If you decide to deploy the stack from the console, ensure that you follow below requirements & step:
- Follow this guide for information on how to deploy the CloudFormation template.
- Use
waopslab-playbook-gather-resources
as the Stack Name, as this is referenced by other stacks later in the lab.
Click here for CloudFormation CLI deployment step (Preferred way)
Note: To deploy from the command line, ensure that you have installed and configured AWS CLI with the appropriate credentials.
- From the Cloud9 terminal, run the command to get into the working script folder
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
- Then run the below commands, replacing the 'AutomationRoleArn' with the Arn of AutomationRole you took note in previous step 3.0.
aws cloudformation create-stack --stack-name waopslab-playbook-gather-resources \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=AutomationRoleArn \
--template-body file://playbook_gather_resources.yml
Example:
aws cloudformation create-stack --stack-name waopslab-playbook-gather-resources \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=arn:aws:iam::000000000000:role/AutomationRole \
--template-body file://playbook_gather_resources.yml
Note: Please adjust your command-line if you are using profiles within your aws command line as required.
- Confirm that the stack has installed correctly. You can do this by running the describe-stacks command below, locate the StackStatus and confirm it is set to CREATE_COMPLETE.
aws cloudformation describe-stacks --stack-name waopslab-playbook-gather-resources
Click here for Console step-by-step
- Go to the AWS Systems Manager console. Click Documents under Shared Resources on the left menu. Then click Create Automation as show in the screen shot below:
- Enter
Playbook-Gather-Resources
in the Name field and copy the notes shown below into the Document description field.
# What does this **playbook** do?
Query the CloudWatch Synthetics Canary and look for all resources related to the application based on it's Application Tag. This **playbook** takes an input of the CloudWatch Alarm ARN triggered by the canary
Note : Application resources must be deployed using CloudFormation and properly tagged accordingly.
## Actions taken in this playbook.
1. Describe CloudWatch Alarm ARN and identify the Canary resource.
2. Describe the Canary resource to gather the value of 'Application' tag
3. Gather CloudFormation Stack with the same value of 'Application' tag.
4. List all resources in CloudFormation Stack.
5. Parse list of resources into String Output.
- In the Assume role field, enter the IAM role ARN we created in the previous section 3.0 Prepare Automation Document IAM Role.
- Expand the Input Parameters section and enter
AlarmARN
as the Parameter name. Set the type asString
and Required asYes
. This will define a Parameter within our playbook, so that the value of the CloudWatch Alarm ARN can be passed into the playbook to run the action.
-
Under Step 1 section specify
Gather_Resources_For_Alarm
Step name, selectaws::executeScript
as the Action type. -
Under Inputs set
Python3.6
as the Runtime and specifyscript_handler
as the Handler. -
Paste in below python codes into the Script section.
import json
import re
from datetime import datetime
import boto3
import os
def arn_deconstruct(arn):
arnlist = arn.split(":")
service=arnlist[2]
region=arnlist[3]
accountid=arnlist[4]
servicetype=arnlist[5]
name=arnlist[6]
return {
"Service": service,
"Region": region,
"AccountId": accountid,
"Type": servicetype,
"Name": name
}
def locate_alarm_source(alarm):
cwclient = boto3.client('cloudwatch', region_name = alarm['Region'] )
alarm_source = {}
alarm_detail = cwclient.describe_alarms(AlarmNames=[alarm['Name']])
if len(alarm_detail['MetricAlarms']) > 0:
metric_alarm = alarm_detail['MetricAlarms'][0]
namespace = metric_alarm['Namespace']
# Condition if NameSpace is CloudWatch Syntetics
if namespace == 'CloudWatchSynthetics':
if 'Dimensions' in metric_alarm:
dimensions = metric_alarm['Dimensions']
for i in dimensions:
if i['Name'] == 'CanaryName':
source_name = i['Value']
alarm_source['Type'] = namespace
alarm_source['Name'] = source_name
alarm_source['Region'] = alarm['Region']
alarm_source['AccountId'] = alarm['AccountId']
result = alarm_source
return result
def locate_canary_endpoint(canaryname,region):
result = None
synclient = boto3.client('synthetics', region_name = region )
res = synclient.get_canary(Name=canaryname)
canary = res['Canary']
if 'Tags' in canary:
if 'TargetEndpoint' in canary['Tags']:
target_endpoint = canary['Tags']['TargetEndpoint']
result = target_endpoint
return result
def locate_app_tag_value(resource):
result = None
if resource['Type'] == 'CloudWatchSynthetics':
synclient = boto3.client('synthetics', region_name = resource['Region'] )
res = synclient.get_canary(Name=resource['Name'])
canary = res['Canary']
if 'Tags' in canary:
if 'Application' in canary['Tags']:
apptag_val = canary['Tags']['Application']
result = apptag_val
return result
def locate_app_resources_by_tag(tag,region):
result = None
# Search CloufFormation Stacks for tag
cfnclient = boto3.client('cloudformation', region_name = region )
list = cfnclient.list_stacks(StackStatusFilter=['CREATE_COMPLETE','ROLLBACK_COMPLETE','UPDATE_COMPLETE','UPDATE_ROLLBACK_COMPLETE','IMPORT_COMPLETE','IMPORT_ROLLBACK_COMPLETE'] )
for stack in list['StackSummaries']:
app_resources_list = []
stack_name = stack['StackName']
stack_details = cfnclient.describe_stacks(StackName=stack_name)
stack_info = stack_details['Stacks'][0]
if 'Tags' in stack_info:
for t in stack_info['Tags']:
if t['Key'] == 'Application' and t['Value'] == tag:
app_stack_name = stack_info['StackName']
app_resources = cfnclient.describe_stack_resources(StackName=app_stack_name)
for resource in app_resources['StackResources']:
app_resources_list.append(
{
'PhysicalResourceId' : resource['PhysicalResourceId'],
'Type': resource['ResourceType']
}
)
result = app_resources_list
return result
def script_handler(event, context):
result = {}
arn = event['CloudWatchAlarmARN']
alarm = arn_deconstruct(arn)
# Locate tag from CloudWatch Alarm
alarm_source = locate_alarm_source(alarm) # Identify Alarm Source
tag_value = locate_app_tag_value(alarm_source) #Identify tag from source
if alarm_source['Type'] == 'CloudWatchSynthetics':
endpoint = locate_canary_endpoint(alarm_source['Name'],alarm_source['Region'])
result['CanaryEndpoint'] = endpoint
# Locate cloudformation with tag
resources = locate_app_resources_by_tag(tag_value,alarm['Region'])
result['ApplicationStackResources'] = json.dumps(resources)
return result
-
Under Additional inputs specify the input value to the step, passing in the parameter we created previously. To do this, specify below values:
InputPayload
as the Input nameCloudWatchAlarmARN: '{{AlarmARN}}'
as the Input Value.
-
Under Outputs specify below values:
Resources
as Name$.Payload.ApplicationStackResources
as SelectorString
as Type
-
Once your settings match the screenshot below, click on Create Automation
Once the automation document is created, you can now give it a test.
- You can then find the newly created document under the Owned by me tab of the Document section in Systems Manager Console.
- Click on the playbook called
Playbook-Gather-Resources
and click on Execute Automation to run your playbook. - Paste in the CloudWatch Alarm ARN ( You can find this ARN in the email notification in section 2.1 Observing the alarm being triggered ) and click on Execute to test the playbook.
- Once the playbook run is completed successfully, click on the Step Id to see the final message and output of the step. You should be able to see this output listing all the resources of the application
- Copy the Resources list output from the section as highlighted in the screenshot below. This list consist of the all the resources defined in the CloudFormation stack related to our application. These information includes the Elastic Load Balancer, ECS and RDS resource id that we can now use to further our investigation of the underlying issue.
- You can Paste the output into a temporary location like notepad for now. You will need this value for our next step.
In the previous step, you have created a playbook that finds all related AWS resources in the application. In this step you will create a playbook that will interrogate resources, capture recent metrics and logs, to look for insights and better understand the root cause of the issue.
In practice, there can be various possibilities of actions that the playbook can take to investigate, depending on the scenario presented by the issue. The purpose of this Lab is to showcase how you can use playbook to aid investigation, rather than advise on a specific action path.
Therefore, in this lab we will assume an example scenario. The playbook will look at metrics and logs of the ELB, ECS and RDS services in the resource list. The playbook will then highlight the metrics and logs that is considered outside of normal operational threshold.
Please follow the below instructions to build this playbook:
Note: We will deploy this playbook via CloudFormation template to simplify deployment. Please follow the steps below to deploy the CloudFormation template via CLI / or Console.
Click here for CloudFormation Console deployment step
Download the template here.
If you decide to deploy the stack from the console, ensure that you follow below requirements & step:
- Please follow this guide for information on how to deploy the CloudFormation template.
- Use
waopslab-playbook-investigate-resources
as the Stack Name, as this is referenced by other stacks later in the lab.
Click here for CloudFormation CLI deployment step (Preferred way)
- From the Cloud9 terminal, change to the required folder as shown:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
- Run the command below, replacing the 'AutomationRoleArn' with the Arn of AutomationRole you took note in previous step 3.0 Prepare Automation Document IAM Role.
aws cloudformation create-stack --stack-name waopslab-playbook-investigate-resources \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=AutomationRoleArn \
--template-body file://playbook_investigate_application_resources.yml
Example:
aws cloudformation create-stack --stack-name waopslab-playbook-investigate-resources \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=arn:aws:iam::000000000000:role/xxxx-playbook-role \
--template-body file://playbook_investigate_application_resources.yml
- Confirm that the stack has installed correctly. You can do this by running the describe-stacks command as follows:
aws cloudformation describe-stacks --stack-name waopslab-playbook-investigate-resources
- Locate the StackStatus and confirm it is set to CREATE_COMPLETE
When the document is created, you can go ahead and run a quick test.
You can find the newly created document under the Owned by me tab of the Document resource in the Systems Manager console.
-
Click on the playbook called
Playbook-Investigate-Application-Resources
and click on Execute Automation to run our playbook. -
Paste in the resources list you took note from the output of the previous playbook ( refer to section 3.1 Building the "Gather-Resources" Playbook ) under Resources and click on Execute
-
Under Executed Steps you should be able to see each of the step the playbook. If you view the content of the document you will be able to see the code and find out what each step does.
For simplicity, we have created a list of output and description for each step. Expand the list below to view.
Output list
Step Name Description Output list Gather_ELB_Statistics Go through the resource list and locate the ELB. Query data from the ELB CloudWatch metrics, looking at metrics from the last 60 minutes. TargetResponseTime (Average) HTTPCode_Target_2XX_Count (Sum) HTTPCode_Target_3XX_Count (Sum) HTTPCode_Target_4XX_Count (Sum) HTTPCode_Target_5XX_Count (Sum) TargetConnectionErrorCount (Sum) UnHealthyHostCount (Average) ActiveConnectionCount (Sum) HTTPCode_ELB_3XX_Count (Sum) HTTPCode_ELB_4XX_Count (Sum) HTTPCode_ELB_5XX_Count (Sum) HTTPCode_ELB_500_Count (Sum) HTTPCode_ELB_502_Count (Sum) HTTPCode_ELB_503_Count (Sum) HTTPCode_ELB_504_Count (Sum) Gather_RDS_Statistics Go through resource list and locate the RDS resource. Query data from the RDS CloudWatch metrics, looking at metrics from the last 60 minutes. BinLogDiskUsage (Sum) BinLogDiskUsage (Sum) BurstBalance (Average) CPUUtilization (Average) CPUCreditUsage (Sum) CPUCreditBalance (Maximum) DatabaseConnections (Sum) DiskQueueDepth (Maximum) FailedSQLServerAgentJobsCount (Average) FreeableMemory (Maximum) MaximumUsedTransactionIDs (Maximum) NetworkReceiveThroughput (Average) OldestReplicationSlotLag (Average) ReadIOPS (Average) ReadLatency (Average) ReadThroughput (Maximum) ReplicaLag (Average) ReplicationSlotDiskUsage (Maximum) SwapUsage (Maximum) TransactionLogsDiskUsage (Maximum) TransactionLogsGeneration (Average) ReplicationSlotDiskUsage (Maximum) WriteIOPS (Average) WriteLatency (Average) WriteThroughput (Average) Gather_ECS_Statistics Go through the resource list and locate the ECS resource. Query data from the ECS CloudWatch metrics, looking at metrics from the last 6 minutes. CPUUtilization (Maximum) MemoryUtilization (Maximum) Gather_ECS_Error_Logs Go through the resource list and locate the ECS Service. Search in CloudWatch logs for any Error occurrence. Gather_ECS_Config Go through the resource list and locate the ECS resource. Describe the ECS service configuration. Gather_RDS_Config Go through the resource list and locate the RDS resource. Describe RDS Instance Config & Parameters. Inspect_Playbook_Results Go through the output of above steps, inspect results and check if it is above the threshold. TargetResponseTime = 5 (ELB) TargetConnectionErrorCount= 0 (ELB) UnHealthyHostCount = 0 (ELB) ELB5XXCount = 0 (ELB) ELB500Count = 0 (ELB) ELB502Count = 0 (ELB) ELB503Count = 0 (ELB) ELB504Count = 0 (ELB) Target4XXCount = 0 (ELB) Target5XXCount = 0 (ELB) CPUUtilization = 80 (ECS) -
Wait until all steps are completed successfully.
So far we have 2 separate playbooks. The first playbook gathers the list of resources associated with the application. The second playbook queries the relevant resources and investigates the appropriate logs and metrics.
In this step we will automate our playbooks further by creating a parent playbook that orchestrates the 2 Investigative playbooks. We will add another step to send notification to our Developers and System Owners.
Follow the instructions below to build the parent Playbook.
Note: Select a step-by-step guide below to build the parent playbook using either the AWS console a CloudFormation template.
Click here for CloudFormation Console deployment step
Download the template here.
If you decide to deploy the stack from the console, follow these steps:
- Please follow this guide for information on how to deploy the CloudFormation template.
- Use
waopslab-playbook-investigate-application
as the Stack Name, as this is referenced by other stacks later in the lab. - In the parameter input screen, under PlaybookIAMRole enter ARN of playbook IAM role (defined in previous step), under NotificationEmail enter your designated email for playbook notification
Click here for CloudFormation CLI deployment step (Preferred way)
- From the Cloud9 terminal, change to the required folder as shown:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
- Then run below command :
aws cloudformation create-stack --stack-name waopslab-playbook-investigate-application \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=AutomationRoleArn \
--template-body file://playbook_investigate_application.yml
Example:
aws cloudformation create-stack --stack-name waopslab-playbook-investigate-application \
--parameters ParameterKey=PlaybookIAMRole,ParameterValue=arn:aws:iam::000000000000:role/xxxx-playbook-role \
--template-body file://playbook_investigate_application.yml
Note: Please adjust your command-line if you are using profiles within your aws command line as required.
Confirm that the stack has installed correctly. You can do this by running the describe-stacks command as follows:
aws cloudformation describe-stacks --stack-name waopslab-playbook-investigate-application
Locate the StackStatus and confirm it is set to CREATE_COMPLETE
Click here for Console step-by-step guide
- From the AWS Systems Manager console, click on documents as shown below. Once you are there, click on Create Automation
- Next, enter in
Playbook-Investigate-Application-From-Alarm
in the Name and paste in the notes shown below into the Description box. This provides a description of the playbook. Systems Manager supports putting in notes as markdown, so feel free to format as required.
# What is does this **playbook** do?
This **playbook** will run **Playbook-Gather-Resources** to gather Application resources monitored by Canary.
Then subsequently run **Playbook-Investigate-Application-Resources** to Investigate the resources for issues.
Outputs of the investigation will be sent to SNS Topic Subscriber
-
Under Assume role field, enter in the ARN of the IAM role we created in the previous step.
-
Under Input Parameters field, enter
AlarmARN
as the Parameter name. Set the type asString
and Required asYes
. This will define a Parameter into our playbook, which allows the value of the CloudWatch Alarm to be passed to the main step that will run the action. -
Add another parameter by clicking on the Add a parameter link. Enter
SNSTopicARN
as the Parameter name. Set the type asString
and Required asYes
. This will define another Parameter into our playbook, so that we can send notification to the Owner and Developer.
-
Click Add Step and create the first step of
aws:executeAutomation
Action type with StepNamePlaybookGatherAppResourcesCanaryCloudWatchAlarm
-
Specify
Playbook-Gather-Resources
as the Document name under Inputs and under Additional inputs specifyRuntimeParameters
with{"AlarmARN":'{{AlarmARN}}'}
as it's value (refer to screenshot below). This step we will be run theGather-Resources
playbook which we created previously.
-
Once this step is defined, add another step by clicking on Add Step at the bottom of the section.
-
For this second step, specify the Step name as
PlaybookInvestigateAppResourcesELBECSRDS
and an action type ofaws:executeAutomation
. -
Specify
Playbook-Investigate-Application-Resources
as the Document name andRuntimeParameters
asResources: '{{PlaybookGatherAppResourcesCanaryCloudWatchAlarm.Output}}'
This will take the output of the first step and pass to the second playbook to run the investigation of associated resources.
-
For the last step, take the output investigation from the second step and send that to the SNS topic where our owner, developers and admin are subscribed.
-
Specify the Step name as
AWSPublishSNSNotification
and the action type asaws:executeAutomation
. -
Specify
AWS-PublishSNSNotification
as the Document name andRuntimeParameters
as shown below. This will take the output of the second step which contains summary data of the investigation and AWS-PublishSNSNotification which will send an email to the SNS we specified in the parameters.
TopicArn: '{{SNSTopicARN}}'
Message: '{{ PlaybookInvestigateAppResourcesELBECSRDS.Output }}'
-
Our playbook will run investigative tasks and send the result to an SNS topic where our Systems administrator / engineer will subscribe to. To do this we will need to create an SNS topic that our playbook will send notification to. Please follow the instructions specified in this link and create a Standard SNS topic and name it
PlaybookNotificationSNSTopic
-
Once you've created the topic, go ahead and subscribe your an email using this instruction here
You can now run the playbook to discover the result of the investigation.
-
Go to the Output section of the deployed CloudFormation stack
walab-ops-sample-application
and take note of below output values. -
Go to the Systems Manager Automation document we just created in the previous step,
Playbook-Investigate-Application-From-Alarm
. -
And then run the playbook passing the ARN as the AlarmARN input value, along with the SNSTopicArn.
- You can get the AlarmARN from the email that you received from CloudWatch Alarm as described in step 3.1 Building the "Gather-Resources" Playbook. in this lab.
- To get the value for SNSTopicArn, go to the CloudFormation console output of
walab-ops-sample-application
stack and copy, paste the value of OutputSystemEventTopicArn
- When the playbook completed, an email will be send to you, which contains a summary of the investigation completed by the playbook as shown.
- Copy and paste the message section and use a json linter tool such as jsonlint.com to give better structure for visibility. The result from the playbook investigation might vary slightly, but the overall findings should be similar to the below screenshot.
-
From the report being generated you should see a large number of ELB504Count error and a high TargetResponseTime from the Load balancer. This explains the delay we are seeing from our canary alarm.
If you then look at the ECS summary, you will notice that there is only 1 ECS TaskRunningCount, with a relatively high CPUUtilization average. The script calculates the average of maximum value on the ECS service in the last 6 minutes window. If you do not see CPUUtilization value in the json, you can confirm this by going to the ECS service console and click on the Metrics tab.
Therefore, it is likely that the immediate cause of the latency is resource constrained at the application API level running in ECS. Ideally, if we can increase the number of tasks in the ECS service, the application should be able to release some of the CPU Utilization constraints.
With all of these information provided by our playbook findings, we should be able to determine what is the next course of action to attempt remediation to the issue.
This concludes Section 3 of this lab, click on the link below to move on to the next section to build the remediation runbook.
In contrast to playbooks, runbooks are procedures that accomplish specific tasks to achieve an outcome. In the previous section, you have identified an issue with CPU utilization, which occurs because there is only 1 ECS task running in the cluster. This could be remediated through the use of auto-scaling.
However, implementing this requires preparation and planning. When an incident occurs, operations teams should have a defined escalation path for the issue. Depending on the criticality of the system they should also be equipped to do what is necessary to ensure system availability is protected while the escalation occurs.
In this section, you will build an automated runbook to remediate the CPU utilization issue by increasing the number of tasks in the ECS cluster. Your automated runbook, will notify the owner of the workload and give them the option to be able to intercept the scale-up action should they choose not to proceed.
- You will build a runbook to scale up the ECS cluster, with the approval mechanism.
- You will execute the runbook and observe the recovery of your application.
In this section you will build a reusable runbook, which provides the owner with the ability to deny or approve remediation actions within a defined waiting period. If the wait time is exceeded and a decision has has not been made, the runbook will automatically approve the action as shown.
We will achieve this through the use of a Systems Manager Automation document, which we will build using the following steps:
-
The
Approval-Gate
runbook executes a separate document called theApprove-Timer
. -
The
Approve-Timer
runbook will then wait for a preconfigured amount of time and send an approve signal to theApproval-Gate
runbook. -
Meanwhile, the
Approval-Gate
runbook then sends an approval request to the workload owner via a designated SNS topic.
- If the owner choose to approve, the
Approval-Gate
runbook will continue to the next step. - If the owner declines the approval, the runbook will fail, blocking further steps.
- However, if the owner does not response within the preconfigured wait time, the
Approve-Timer
runbook will automatically approve the request.
Follow the instructions below to build the runbook:
Note: Select a step-by-step guide below to build the runbook using either the AWS console or CloudFormation template.
Click here for Console step by step
-
Go to the AWS Systems Manager console. Click Documents under Shared Resources on the left menu. Then click Create Automation as show in the screen shot below:
-
Enter
Approval-Timer
in the Name field and copy the notes shown below into the Document description field.# What does this automation do? Automatically trigger 'Approval' Signal to an execution, after a timer lapse ## Steps 1. Sleep for X time specified on the parameter input 2. Automatically signal 'Approval' to the Execution specified in parameter input
-
In the Assume role field, enter the IAM role ARN we created in the previous section 3.0 Prepare Automation Document IAM Role.
-
Expand the Input Parameters section and enter
Timer
as the Parameter name. Set the type asString
and Required asYes
. -
Then add another parameter this time called
AutomationExecutionId
, of typeString
and set Required toYes
. Once you are done, your configuration should look like the screenshot below. -
Under Step 1 section specify
SleepTimer
as Step name, selectaws::sleep
as the Action type. -
Expand the Inputs section of the step, and specify
{{Timer}}
as the Duration -
Click on Add step and specify
ApproveExecution
as Step name, selectaws::executeAwsApi
as the Action type. -
Expand the Inputs section of the step, and specify
ssm
in the Service field andSendAutomationSignal
in the API field. -
Under Additional inputs specify below values.
Approve
as the SignalType{{AutomationExecutionId}}
as the AutomationExecutionId.
Once you are done, your configuration should look like the screenshot below.
6 . Click on Create automation once you are done.
Next, you will create the Approval-Gate
runbook responsible for running the Approval-Timer
runbook asynchronously. Follow below steps to complete the configuration:
-
From the AWS Systems Manager console, select Documents under Shared Resources on the left menu. Then click Create Automation as show in the screen shot below:
-
Next, enter
Approval-Gate
in the Name field and add the notes shown below to the Document description field.# What does this automation do? Place a gate before your desired step to create approval mechanism. Automation will trigger an asynchronously timer that will automatically approve once the time has lapsed. Automation will then send approval / deny request to the designated SNS Topic. When deny is triggered by approver, the step will fail and block the following step from executing. Note: Please ensure to have onFailure set to abort in your automation document. ## Steps 1. Trigger an asynchronously timer that will automatically approve once the time has lapsed. 2. Send approval / deny request to the designated SNS Topic.
-
In the Assume role field, enter the IAM role ARN we created in the previous section 3.0 Prepare Automation Document IAM Role.
-
Expand the Input Parameters section and enter the following:
Timer
as the Parameter name, set the type asString
and Required asYes
.NotificationMessage
as the Parameter name, set the type as typeString
and Required isYes
.NotificationTopicArn
as the Parameter name, set the type as typeString
and Required isYes
.ApproverRoleArn
as the Parameter name, set the type as typeString
and Required isYes
.
-
Expand Step 1 create a step named
executeAutoApproveTimer
and action typeaws:executeScript
. -
Expand Inputs, then set the Runtime as
Python3.6
and paste in below code into the script section. Note that code snippet will execute theApproval-Timer
runbook you created asyncronously.import boto3 def script_handler(event, context): client = boto3.client('ssm') response = client.start_automation_execution( DocumentName='Approval-Timer', Parameters={ 'Timer': [ event['Timer'] ], 'AutomationExecutionId' : [ event['AutomationExecutionId'] ] } ) return None
-
Expand Additional Inputs, then select
InputPayload
under Input Name, and add the text shown below to Input Value:AutomationExecutionId: '{{automation:EXECUTION_ID}}' Timer: '{{Timer}}'
Once you have completed this step, your Step 1 configuration should look like below screenshot.
-
Click Add step to create Step 2
-
Create a step named
ApproveOrDeny
and action typeaws:approve
. -
Expand Inputs and specify below values under Approvers, replacing the
AutomationRoleArn
with the Arn of AutomationRole you took note of in section 3.0 Prepare Automation Document IAM Role.[ '{{ApproverRoleArn}}', 'AutomationRoleArn' ]
Example:
[ '{{ApproverRoleArn}}', 'arn:aws:iam::xxxxx:role/AutomationRole' ]
-
Expand Additional Inputs and specify the following values:
NotificationArn
as the Input name, and{{NotificationTopicArn}}
as the Input valueMessage
as the Input name, and{{NotificationMessage}}
as the Input valueMinRequiredApprovals
as the Input name, and1
as the Input value
-
Expand Common properties and change the following properties to below values (keep the remaining as it is):
Continue
for On failurefalse
for Is critical
Once you have completed this step, your Step 2 configuration should look like below screenshot.
-
Click Add step to create Step 3
-
Create a step named
getApprovalStatus
and action typeaws:executeAwsApi
-
Expand Inputs and specify
ssm
in the Service field, andDescribeAutomationStepExecutions
in the API field. -
Expand Additional Inputs and specify below values:
-
AutomationExecutionId
as the Input Name, and{{automation:EXECUTION_ID}}
as the Input value -
Filters
as the Input Name, and copy below values as the Input value- Key: StepName Values: - requestApproval
-
-
Expand Outputs and specify below values:
approvalStatusVariable
as the Name$.StepExecutions[0].Outputs.ApprovalStatus[0]
as the SelectorString
as the Type
Once you have completed this step, your Step 3 configuration should look like below screenshot.
-
Click on Create automation to complete the configuation.
Click here for CloudFormation deployment steps
Download the template here.
If you decide to deploy the stack from the console, ensure that you follow below requirements & step:
- Please follow this guide for information on how to deploy the CloudFormation template.
- Use
waopslab-runbook-approval-gate
as the Stack Name, as this is referenced by other stacks later in the lab.
Click here for CloudFormation CLI deployment step (Preferred way)
-
From the Cloud9 terminal, change to the templates folder as shown:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
-
Run the below commands, replacing the
AutomationRoleArn
with the Arn of AutomationRole you took note of in section 3.0 Prepare Automation Document IAM Role.aws cloudformation create-stack --stack-name waopslab-runbook-approval-gate \ --parameters ParameterKey=PlaybookIAMRole,ParameterValue=AutomationRoleArn \ --template-body file://runbook_approval_gate.yml
With your AutomationRole Arn in place your command will look similar to the following example:
aws cloudformation create-stack --stack-name waopslab-runbook-approval-gate \ --parameters ParameterKey=PlaybookIAMRole,ParameterValue=arn:aws:iam::000000000000:role/xxxx-runbook-role \ --template-body file://runbook_approval_gate.yml
-
Confirm that the stack has installed correctly. You can do this by running the describe-stacks command below, locate the StackStatus and confirm it is set to CREATE_COMPLETE.
aws cloudformation describe-stacks --stack-name waopslab-runbook-approval-gate
Next, you are going to build the ECS-Scale-Up runbook which will complete the following:
- Run the
Approval-Gate
runbook which you created previously. - Wait for the
Approval-Gate
runbook to complete. - Once the
Approval-Gate
runbook completes successfully, the runbook will increase the number of ECS tasks in the cluster.
Please follow below steps to build the runbook.
Note: Select a step-by-step guide below to build the runbook using either the AWS console or CloudFormation template.
Click here for Console step by step
-
Go to the AWS Systems Manager console. Click Documents under Shared Resources on the left menu. Then click Create Automation as show in the screen shot below.
-
Next, enter
Runbook-ECS-Scale-Up
in the Name field and add the notes shown below to the Document description field:# What does this automation do? Scale up a given ECS service task desired count to certain number, with approval process. The automation will trigger Approval-Gate runbook, before executing. ## Steps 1. Trigger Approval-Gate 2. Scale ECS Service by number of service
-
In the Assume role field, enter the IAM role ARN we created in the previous section 3.0 Prepare Automation Document IAM Role.
-
Expand the Input Parameters section and enter the following.
ECSDesiredCount
as the Parameter name, set the type asInteger
and Required asYes
.ECSClusterName
as the Parameter name, set the type asString
and Required isYes
.ECSServiceName
, as the Parameter name, set the type asString
and Required isYes
.NotificationTopicArn
, as the Parameter name, set the type asString
and Required isYes
.NotificationMessage
, as the Parameter name, set the type asString
and Required isYes
.ApproverRoleArn
, as the Parameter name, set the type asString
and Required isYes
.Timer
, as the Parameter name, set the type asString
and Required isYes
.
-
Expand Step 1 create a step named
executeApprovalGate
and action typeaws:executeAutomation
. -
Expand Inputs, then set the Document name as
Approval-Gate
. -
Expand Additional inputs and select
RuntimeParameters
as the Input Name -
Paste in below as the Input Value
{
"Timer":'{{Timer}}',
"NotificationMessage":'{{NotificationMessage}}',
"NotificationTopicArn":'{{NotificationTopicArn}}',
"ApproverRoleArn":'{{ApproverRoleArn}}'
}
-
Click Add Step to create the second step.
-
Specify
updateECSServiceDesiredCount
as Step Name and selectaws:executeAwsApi
as Action type. -
Expand Inputs and configure the following values:
ecs
as ServiceUpdateService
as Api
-
Expand Additional inputs and configure the following values:
forceNewDeployment
as the Input Name andtrue
as Input ValuedesiredCount
as the Input Name and{{ECSDesiredCount}}
as Input Valueservice
as the Input Name and{{ECSServiceName}}
as Input Valuecluster
as the Input Name and{{ECSClusterName}}
as Input Value
13 . Click on Create automation once complete
Click here for CloudFormation Console deployment step
Download the template here.
If you decide to deploy the stack from the console, ensure that you complete the following steps:
- Please follow this guide for information on how to deploy the CloudFormation template.
- Use
waopslab-runbook-scale-ecs-service
as the Stack Name, as this is referenced by other stacks later in the lab.
Click here for CloudFormation CLI deployment step (Preferred way)
-
From the Cloud9 terminal, run the command to get into the working script folder.
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/templates
-
Then run below commands, replacing the 'AutomationRoleArn' with the Arn of AutomationRole you took note in previous step 3.0 Prepare Automation Document IAM Role.
aws cloudformation create-stack --stack-name waopslab-runbook-scale-ecs-service \ --parameters ParameterKey=PlaybookIAMRole,ParameterValue=AutomationRoleArn \ --template-body file://runbook_scale_ecs_service.yml
Example:
aws cloudformation create-stack --stack-name waopslab-runbook-scale-ecs-service \ --parameters ParameterKey=PlaybookIAMRole,ParameterValue=arn:aws:iam::000000000000:role/AutomationRole \ --template-body file://runbook_scale_ecs_service.yml
-
Confirm that the stack has installed correctly. You can do this by running the describe-stacks command below, locate the StackStatus and confirm it is set to CREATE_COMPLETE.
aws cloudformation describe-stacks --stack-name waopslab-runbook-scale-ecs-service
Now, lets run the runbook you created above to remediate the issue.
-
Go to the AWS CloudFormation console.
-
Click on the stack named
walab-ops-sample-application
. -
Click on the Output tab, and take note following output values. You will need these values to execute the runbook.
- OutputECSCluster
- OutputECSService
- OutputSystemOwnersTopicArn
-
If you are currently using an IAM user or role to log into your AWS Console, take note of the ARN. You will need this ARN when executing the runbook to restrict access to approve or deny request capability.
To find your current IAM user ARN, go to the IAM console and click Users on the left side menu, then click on your User name. For IAM role, go to the IAM console and click Roles on the left side menu, then click on the Role name, you are using.
You will see something similar to the example below. Take note of the ARN value,and proceed to the next step.
-
Go to the Systems Manager Automation console, click on Document under Shared Resources, locate and click an automation document called
Runbook-ECS-Scale-Up
. -
Then click Execute automation.
-
Fill in the Input parameters with values below.
-
For ECSServiceName, place the value of OutputECSService you took note on step 3.
-
For ECSClusterName, Place the value of OutputECSCluster you took note on step 3.
-
For ApproverArn, place the ARN value you took note on step 4.
-
For ECSDesiredCount, place in
100
to increase the task number to 100. -
For NotificationMessage, place in any message that can help the approver make an informed decision when approving or denying the requested action.
For example:
Hello, your mysecretword app is experiencing performance degradation. To maintain quality customer experience we will manually scale up the supporting cluster. This action will be approximately 10 minutes after this message is generated unless you do not consent and deny the action within the period.
-
For NotificationTopicArn, place the value of OutputSystemOwnersTopicArn you took note on step 3.
-
For Timer, you can specify
PT5M
or specify a value defined in ISO 8601 duration format.
-
-
Click Execute to run the runbook.
-
Once the runbook is running, you will receive an email with instructions approve or deny, on the email address subscribed to the owners SNS topic ARN. Follow the link in the email using the User of the ApproverArn you placed in the Input parameters. The link will take you to the SSM Console where you can approve or deny the request.
If you approve, or ignore the email, the request will be automatically be approved after the Timer set in the runbook expires. If you deny, the runbook will fail and no action will be taken.
-
Once the runbook completes, you can see that the ECS task count increased to the value specified.
-
Go to ECS console and click on Clusters and select
mysecretword-cluster
. -
Click on the
mysecretword-service
Service, and you will see the number of running tasks increasing to 100 and the average CPUUtilization decrease. -
Subsequently, you will see the API response time returns to normal and the CloudWatch Alarm returns to an OK state.
You can check both using your CloudWatch Console, following the steps you ran in section 2.1 Observing the alarm being triggered.
You have now completed the Automating operations with Playbooks and Runbooks lab, click on the link below to cleanup the lab resources.
In this section you will delete all resources related to the lab environment.
- Run the following command to navigate to the script folder.
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/scripts/
- Run the teardown_resources.sh script to delete all resources related to the lab.
bash teardown_resources.sh
In this lab you learnt:
- Build and run automated playbooks to support your investigations
- Build and run automated runbooks to remediate specific faults
- Enabling traceability of operations activities in your environment