Git Product home page Git Product logo

quota-monitor-for-aws's Introduction

Quota Monitor for AWS

🚀Solution Landing Page | 🚧Feature request | 🐛Bug Report | 📜Documentation Improvement

Note: For any relevant information outside the scope of this readme, please refer to the solution landing page and implementation guide.

Table of content

Solution overview

Quota Monitor for AWS is a reference implementation that provides a foundation for monitoring AWS services' quotas usage. Customers can leverage the solution to monitor quotas across services supported by Amazon Trusted Advisor and AWS Service Quotas; in multiple regions and multiple AWS accounts. The solution integrates with Amazon SNS and Slack to notify customers for service quotas approaching thresholds.

Architecture

The architecture can be broken down into different components, few which are installed in the monitoring account and others are installed in the monitored accounts. The monitoring account also known as hub account, collects all usage events from the monitored accounts (spokes) and raises notifications. Additionally, all the usage events are put on DynamoDB table in the hub account, which can be used to view historical trend of resource usage across all accounts.

Deployment scenarios:

The solution follows hub-spoke model and supports different deployment scenarios

  • Environments where all AWS accounts are part of your AWS Organization
  • Hybrid environments with AWS Organization and independent AWS accounts
  • Environments not using AWS Organizations

hub: For the first two scenarios use quota-monitor-hub.template. For environments not using Organizations, use quota-monitor-hub-no-ou.template. Note: Hub template should be deployed in the monitoring account. For the first two scenarios, this account should also be your delegated administrator for StackSets in the organization.

spoke: Spoke templates are automatically deployed by StackSets for targeted Organizational Units. For hybrid environments and environments not using Organizations, deploy the spoke templates individually in the accounts where monitoring is needed. Note: ta-spoke.template should be deployed in us-east-1 ONLY. sq-spoke.template can be deployed in any region.

Installing pre-packaged solution template

Note: hub, hub-no-ou and sq-spoke templates can be deployed in ANY region; prerequisite and ta-spoke template can be deployed in us-east-1 ONLY.

Parameters for hub template

  • Deployment Configuration: Choose Organizations or Hybrid based on your use-case
  • Notification Configuration: Choose the notifications you want to receive

Note: Deployment Configuration parameter is not available in hub-no-ou template.

Parameters for spoke templates

  • EventBridge bus arn: Arn for the EventBridge bus where you want to send usage events

Note: You may leave rest of the parameters to default.

Customization

The steps given below can be followed if you are looking to customize the solution or extend the solution with newer capabilities

Setup

  • Javascript Pre-requisite: node=v16.17.0 | npm=8.15.0

Clone the repository and run the following commands to install dependencies

git clone https://github.com/aws-solutions/quota-monitor-for-aws.git
cd ./quota-monitor-for-aws
npm ci

(optional) Run the following commands to format and lint the project per the project standards

npm run prettier-format
npm run lint

Note: Following steps have been tested under above pre-requisites

Running unit tests for customization

Run unit tests to make sure added customization passes the tests.

cd ./deployment
chmod +x ./run-unit-tests.sh
./run-unit-tests.sh

✅ Ensure all unit tests pass. Review the generated coverage report.

Build

To build your customized distributable run build from the project root

npm run build:all

✅ All assets are now built.

Deploy

Run the following command from the root of the project

cd ./source/resources
npm ci

Bootstrap your CDK environment

npm run cdk -- bootstrap --profile <PROFILE_NAME>
npm run cdk -- deploy <STACK_NAME> --profile <PROFILE_NAME>
npm run orgHub:deploy -- deploy quota-monitor-hub --profile <PROFILE_NAME> (to deploy the hub template in org mode which works with stacksets)

Note:

  • STACK_NAME, substitute the name of the stack that you want to deploy, check cdk app
  • PROFILE_NAME, substitute the name of an AWS CLI profile that contains appropriate credentials for deploying in your preferred region

✅ Solution stack is deployed with your customized code.

Independent spoke templates

There are two spoke templates packaged with the solution

  • ta-spoke: provisions resources to support Trusted Advisor quota checks
  • sq-spoke: provisions resources to support Service Quotas checks

Both spoke templates are independent standalone stacks that can be individually deployed. You can deploy the spoke stack and route usage events and notifications to your preferred destinations. Additionally, in sq-spoke stack you can control which services to monitor, by toggling monitored status of the services in the DynamoDB table ServiceTable. For deploying sq-spoke stack:

npm run cdk -- deploy quota-monitor-sq-spoke --parameters EventBusArn=<BUS_ARN> --profile <PROFILE_NAME>

Note: BUS_ARN, substitute the arn of the EventBridge bus where you want to send usage events

SSM Parameter Store based workflow

The solution provisions /QuotaMonitor/OUs and /QuotaMonitor/Accounts SSM Parameter Store. You can modify the parameters at any point after deployment to update the list of targeted organizational-units and accounts for monitoring.

  • /QuotaMonitor/OUs: Once you update the parameter, StackSets takes care of deploying the spoke templates in the targeted OUs

  • /QuotaMonitor/Accounts: Once you update the parameter, you need to deploy the spoke templates individually in the targeted accounts

File Structure

The project consists of several microservices, shared utility lambda layer and CDK resources

|-deployment/
  |-run-unit-test.sh              [ run all unit tests ]
  |-templates                     [ solution pre-baked templates ]
|-source/
  |-lambda
    |-services/
      |-cwPoller/                 [ microservice for polling CloudWatch metrics for quotas usage ]
      |-deploymentManager/        [ microservice for managing CloudFormation StackSet deployments ]
      |-helper/                   [ microservice for helper modules ]
      |-preReqManager/            [ microservice for fulfilling pre-requisites in the management account ]
      |-quotaListManager/         [ microservice for managing quota list that supports usage monitoring ]
      |-reporter/                 [ microservice for putting quota usage details on dynamodb ]
      |-slackNotifier/            [ microservice for raising alerts on slack ]
      |-snsPublisher/             [ microservice for publishing alerts to SNS ]
      |-taRefresher/              [ microservice for refreshing trusted advisor checks ]
    |-utilsLayer/                 [ lambda layer with shared modules, like logger, metrics, try/catch wrapper ]
  |-resources                     [ cdk resources to provision infrastructure ]
|-README.md
|-additional_files                [ CHANGELOG, CODE_OF_CONDUCT, LICENSE, NOTICE, sonar-project.properties etc.]

License

See license here

Collection of operational metrics

This solution collects anonymized operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the implementation guide.


Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://www.apache.org/licenses/LICENSE-2.0

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.

quota-monitor-for-aws's People

Contributors

aaronschuetter avatar abewub avatar aijunpeng avatar brandonmorgado avatar g-lenz avatar georgebearden avatar gsingh04 avatar hyandell avatar iscofield avatar martinb3 avatar sanjay-amazon avatar shsenior avatar tbelmega avatar trobiv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quota-monitor-for-aws's Issues

deployment/run-unit-tests.sh is errorneous

The first "cd ../source/services/limitreport" works but then the second "cd ../source/services/slacknotify" fails because it wants to change to: "source/services/source/services/slacknotify".

Feature Request: custom aws service quotas checks

The Trust advisior only checks some aws service limits. Many other aws service limits are not included.
Is it possible with a custom template for limit monitor to check other aws service limits(not included in Trusted advisior)? Do you have an example template for this?

best regards

Monitor is alerting labelling excluded Service Limits from TrustedAdvisor as Warnings

@aws-limit-monitor
Limit Monitor Documentation
AccountId
XXXXXXXXXXXXXX
Status
:warning:
TimeStamp
2020-01-21T20:00:15Z
Region
us-east-1
Service
RDS
LimitName
Max auths per security group
CurrentUsage
20
LimitAmount
20

It seems that even though the service limit has been excluded from the TA service limit check, the alert is still being sent as a warning. As you may know. RDS's "Max auths per security group" is a hard limit

where will I get 'initialize-repo.sh script?'

Hi
I am planning to run the 'build-s3-dist.sh' from the deployment directory. I have noted that the 1st prerequisite is :
Important notes and prereq's:

1. The initialize-repo.sh script must have been run in order for this script to

function properly.

Can anyone please advise me where I could get this above mentioned 'initialize-repo.sh script?'

Dipankar Chakrabarti

Incorrect mapping between DynamoDB attributes and TA keys

Example key from latest version of Limit Monitor:

{
  "AccountId": "123456789012",
  "CurrentUsage": "Green", <-
  "ExpiryTime": 1560964756524,
  "LimitAmount": "0", <-
  "LimitName": "1000000", <-
  "MessageId": "02102f5b-01cf-4f88-86b2-3cda37bd1fe5",
  "Region": "ap-south-1",
  "Service": "EBS",
  "Status": "OK",
  "TimeStamp": "2019-06-04T17:16:04Z"
}

Culprit seems to be here, as Slack notifications are referencing proper attributes

Deployment not working - failing cloudwatch events

I have deployed the stack, but dont believe it is working as expected.

I enabled DEBUG on all the lambdas and I see in the limitCheckStack only the following message:
START RequestId: b3f11639-74d7-439e-9751-896b78fa6c48 Version: $LATEST
18:55:17
2020-02-06T18:55:17.338Z b3f11639-74d7-439e-9751-896b78fa6c48 INFO [ERROR]UnknownEndpoint: Inaccessible host: servicequotas.eu-north-1.amazonaws.com'. This service may not be available in the eu-north-1' region.
18:56:09
END RequestId: b3f11639-74d7-439e-9751-896b78fa6c48

There is nothing in the dynamodb table and nothing being sent to either of the SQS. The Troubleshooting section suggests ensuring that the account ID is correct. I have verified the account is correct and "surrounded".

I see from the Lambda > Application > Monitoring > CW Events that all of the cw events have "Failed Invocations". No other monitoring tabs show any errors.

Please assist.

Thanks,
Ryan

Enable support for StackSet SERVICE_MANAGED

A great feature would be to have a support for StackSet SERVICE_MANAGED using Organization OU.

Currently if i want to deploy spoke stacks through Stackset on a Organization OU, i still need to add the new account into the "Account list" master stack parameter.

Additionnaly, nested stacks are not supported in stackset SERVICE_MANAGED permission model.

AccountList Parameter Value Limit

With over 100+ accounts loaded in the accountlist, I am hitting the parameter value limit for this. Solution is to have 2-3 parameter lists and doing a join

use managed policies vs inline

We've deployed this solution and we're now getting notifications from best practices checks about the inline policy used by this solution:
https://github.com/awslabs/aws-limit-monitor/blob/1f365564eae725aeac994bbbd5b2895d76c30332/deployment/service-quotas-checks.template#L57-L75

Would it be possible to change this from an inline policy to a managed policy?

Supporting docs:

The different types of policies are for different use cases. In most cases, we recommend that you use managed policies instead of inline policies.
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html

For custom policies, we recommend that you use managed policies instead of inline policies.
https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#best-practice-managed-vs-inline

EBS limit monitoring

You have exceeded your maximum gp2 storage limit of 600 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase.

It would be nice to get messages when approaching our EBS limits like we do with other EC2 limits

Create Trust Is Failing - PolicyLengthExceededException

Hi,

When deploying the AWS Limit Monitor to 30+ accounts, we are getting this error on the Lambda Function LimtrHelperFunction, that support the CFn Custom Resource.

{
    "CreateTrust": {
        "status": "ERROR",
        "account": "xxxxxxxxxxxx",
        "response": {
            "message": "Policy size would be larger than the maximum allowed.",
            "code": "PolicyLengthExceededException",
            "time": "2019-xx-xxTxx:xx:xx.263Z",
            "requestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
            "statusCode": 400,
            "retryable": false,
            "retryDelay": 89.38078875757718
        }
    }
}

From what I could find out this is happening because we are reaching the limit of 10KB for the permission policy of the event bus.

From the python SDK [1] (I believe on nodejs is the same) I've read the following:

if all the accounts are members of the same AWS organization, you can run PutPermission once specifying Principal as "*" and specifying the AWS organization ID in Condition , to grant permissions to all accounts in that organization

So, is it possible to update the CFn templates and CFn Lambda Helper functions to support using the AWS Org ID in the createTrust function?

[1] - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/events.html#CloudWatchEvents.Client.put_permission

Due to updates to IAM permissions in EventBridge, existing instructions about adding permissions don't work

I am trying to work out the full policy to manually put in to avoid the issue "Amazon CloudWatch Events Bus Permissions Error". The instructions in the troubleshooter are no longer correct.

Firstly we can't add them anymore in cloudwatch console we have to add them in EventBridge, secondly adding the permissions fro the root/master account alone doesn't work fully, I think I have to do something in the sending account as well as the receiving account.

I am adding something similar in receiving/master account:

"Version": "2012-10-17",
"Statement": [{
"Sid": "limtr-458054464678",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<ID of master/receiver account with event bus in it>:root"
},
"Action": "events:PutEvents",
"Resource": "arn:aws:events:us-east-1:<ID of master/receiver account with event bus in it>:event-bus/default"
}, {
"Sid": "842046a0-70a3-11eb-95a7-93a72639c6a91613513316362",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<ID of secondary/sender account>:root"
},
"Action": "events:*",
"Resource": "arn:aws:events:us-east-1:<ID of master/receiver account with event bus in it>:event-bus/default"
}]
}

The Limits Report is not accurate

Hello, I have two problems with using this tool to monitor our enterprise AWS account.

  1. The VPC Elastic IP addresses (EIPs) limit
    It was reported reaching the 80%(4/5) on 08/26, and the limit was increased to 6 on 08/27. But the monitoring still report below message during 08/28 to 08/29(today). Something wrong with this check. We had other two limits reported and they don't have same problem after the limits were increased.
    VPC Elastic IP addresses (EIPs)
    Region: us-east-1
    Resource Limit: 5
    Resource Usage: 4
  2. Totally there are 92 limits info reported from trusted advisor under our AWS account, it looks only a few of them are covered by this aws-limit-monitor, and we do have 3 limits already alerted from Trusted Advisor but was not reported by aws-limit-monitor. Can I have idea about what limits this tool covers?

Thanks.

Cannot find module 'uuid/v4'

Got this error:
Cannot find module 'uuid/v4' when invoking the helper function.

Easy fix.
const uuidv4 = require('uuid').v4;

Timeout from limitCheckStack-LimitMonitorFunction every once in a while

It seems that every once in a while (happens ~2 per day) we are getting a timeout from the Lambda.
Looking at the cloudwatch logs there's nothing there.
In addition I see that this function consumes ~100-120MB of RAM and the limit of that function is 128MB.

I've added the graphs of the invocation and timeout, and the cloudwatch logs of a timed out invocation:

image

image

This is running the latest version of your code, deployed through the link in the AWS Limit Monitor home page without any modifications.

Cloudformation Failing TARefresher

Seems the limit-monitor-spoke template is now failing because it is creating a function called TARefresher instead of {stackname}-TARefresher-{randomcharacters}

This is preventing me from creating multiple stacks in the same account

Wrong Condition for TASNSRule resource in limit-monitor.template

Using the latest version of your template (5.3.3 at the time of writing) i've found a wrong "Condition" for the "TASNSRule" Resource (row 2145 in limit-monitor.template)

It's:
'Condition': 'SlackTrue'

It would be:
'Condition', 'SNSTrue'

If i don't set the Slack-related parameters (that was my use case) the EventBridge Rule for SNS will not be created.

Check Deprecation Warning

Hello,

We received a warning from AWS that the following checks are now deprecated and scheduled to be removed on Nov 18th:

This notification will impact you if you are using one of the following checks in AWS Trusted Advisor:

  1. EC2Config Service for EC2 Windows Instances in Fault Tolerance category with Check ID V77iOLlBqz
  2. PV Driver Version for EC2 Windows Instances in Fault Tolerance category with Check ID Wnwm9Il5bG
  3. NVMe Driver Version for EC2 Windows Instances in Fault Tolerance category with Check ID yHAGQJV9K5
  4. ENA Driver Version for EC2 Windows Instances in Fault Tolerance category with Check ID TyfdMXG69d
  5. EBS Active Volumes in Service Limits category with Check ID fH7LL0l7J9

What is the change?

AWS Trusted Advisor will remove 5 checks on November 18, 2020. On this date, 5 checks listed above will be removed from the list of checks in both Support API and the Trusted Advisor console. If you consume results from all checks in Trusted Advisor, these checks will not show up in check results. If you specifically consume any of the 5 checks listed above (such as by hard coding the Check ID in an API call), you will need to remove these checks from your list to avoid API call errors.

We see in the current iteration of aws-limit-monitor that the EBS Active Volumes check seems to be used: https://github.com/awslabs/aws-limit-monitor/blob/master/source/services/tarefresh/lib/ta-refresh.js#L38 - there may be others, this is just the first I spotted.

Are there any plans to update this tool so the above check deprecation doesn't cause breakage?

Thanks!

TypeError running limitMaster.py

Hi,

When trying to execute a Test Event I receive the following error:

cannot concatenate 'str' and 'int' objects: TypeError
Traceback (most recent call last):
File "/var/task/limitMaster.py", line 10, in lambda_handler
print "do stuff here with AWS account "+id+"
"
TypeError: cannot concatenate 'str' and 'int' objects

Any help would be appreciated.

Cheers,
S

Parameter Store variable is not created

I ran the solution with 4 accounts and set up Slack configuration from the Stack. A parameter store for the slack was not created and the lambda fails when sending alerts:

2019-09-27T00:27:36.225Z 0008c69c-9c4f-4758-9f27-aec000e61988 [ERROR]AccessDeniedException: No access to reserved parameter name: awslimitschecker.
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18)

Alerting every 30 minutes

Hello, I don't see a way to change the frequency of the alert.

We are getting an alert every 30 minutes. And it's becoming annoying. We place support tickets to request increases, but sometime its takes more than 24 hours for the limit to be increased and in the mean time we keep getting the alerts.

Can we limit the alert to once or twice a day?

Connection refused - Slack Notification

Hello, I am getting

2019-11-05T02:21:27.912Z c58f1545-4b3d-457a-ab90-4ac3f5206fa3 Error: connect ECONNREFUSED 127.0.0.1:443
at Object._errnoException (util.js:1022:11)
at _exceptionWithHostPort (util.js:1044:20)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1198:14)

error in lambda slack notification, I only find this error in the logs.

I get the email correctly, but slack notifications don't work.

I used the oficial documentation https://aws.amazon.com/solutions/limit-monitor/

someone experiencing the same problem?
thanks

Unable to deploy limit monitor solution - Lambda function logs shows - INFO [ERROR]ParameterNotFound: null

I gave below parameters while creating stack
SlackChannel : slacknotifier
SlackHookURL :https://hooks.slack.com/services/xxxxxx/xxx/xxxx

CF created only one parameter in parameter store as - slacknotifier and value as slack_dummy.

Result : Lambda function logs shows - INFO [ERROR]ParameterNotFound: null

Let me know what we need to do to solve this issue ?
Let me know key, value pairs to be created in parameter store.

CloudWatch event(SQSRule) invocation failed.

CloudWatch event(SQSRule) invocation failed when trying to send message to SQS. Seems SQS uses an AWS managed key alias/aws/sqs for encryption. https://github.com/awslabs/aws-limit-monitor/blob/master/deployment/limit-monitor.template#L225
However, this key is not accessible for CloudWatch event due to the following limit defined in the key policy.

"Condition": {
    "StringEquals": {
        "kms:CallerAccount": "xxxxxxxxxxxxxxx",
        "kms:ViaService": "sqs.us-east-1.amazonaws.com"
    }
}

I suppose that a customer managed key should be used here just like SNS part. https://github.com/awslabs/aws-limit-monitor/blob/master/deployment/limit-monitor.template#L471

Inconsistency of Email Notification Level and Email Address Required

Email Notification Level
List of alert levels to send email alerts in response to. Leave blank if you do not wish to receive email notifications. Must be double-quoted and comma separated.

Email Address
(Required) The email address to subscribe for alert messages.

It doesn't make sense to me to have an option in email notification level to leave it blank if we don't want notifications and then to also require that an email address be present.

also if I do try to leave it blank for Email Notification Level I get the following error:

Template format error: Unresolved resource dependencies [SNSTopic] in the Resources block of the template

so basically it an Email Notification Level appears to be required.

Slack Notification - Disable - CFN Fails Validation

Instructions Incorrect (https://docs.aws.amazon.com/solutions/latest/limit-monitor/deployment.html#step1)

In the section about Slack Notification Levels:

Choose the status event level(s) that will trigger Slack notifications. For example, “WARN”, “ERROR”. Note that the format is double quotation marks and comma separated (for multiple values).
Note
Leave this parameter blank if you do not want to receive Slack notifications. Note that the Slack notification components will not be deployed.

However if left Blank, the Cloudformation template fails validation, citing SlackRole Dependency check

Trusted Advisor refresh always runs

It appears that when this lambda executes it "always" runs a trusted advisor update check, which can be problematic when you have an account that's very large. I recommend adding in some validation to help it to determine if it's been updated within X amount of time to make sure it's not just always hitting refresh.

The results of this are the following error:
Unable to retrieve Trusted Advisor results. Some data may be from the previous report.

SSM Parameter name must be a fully qualified name

Version

5.2

Issue

When a slack notification is enabled with the values:

SlackChannel: service-limit-alerts
SlackHookURL: https://hooks.slack.com/services/xxxxxx/xxx/xxx

The following error is logged in CloudWatch:

{
    "SSMParameter": {
        "channelKey": "service-limit-alerts",
        "hookKey": "https://hooks.slack.com/services/xxxxxx/xxx/xxx",
        "response": {
            "message": "Parameter name must be a fully qualified name.",
            "code": "ValidationException",
            "time": "2019-08-08T12:05:04.216Z",
            "requestId": "07cae9a5-1a44-4090-a361-7d070cf7f151",
            "statusCode": 400,
            "retryable": false,
            "retryDelay": 94.35200443904534
        }
    }
}

Investigation

LimtrHelperFunction -> lib/index.js revealed the SSM parameter value by default is SLACK_DUMMY:

await Promise.all(
      data.InvalidParameters.map(async ssmParam => {
        console.log(ssmParam);
        await ssm
          .putParameter({
            Name: ssmParam /* required */,
            Type: 'String' /* required */,
            Value: 'SLACK_DUMMY' /* required */,
          })
          .promise();
      })
    );

Also, there is this block of code in SlackNotifier -> lib/slack-notify.js:

self.ssm.getParameter(
            {
              Name: _self.slackHookURL,
              WithDecryption: true,
            },
            function(err, _hookData) {
              if (err) return cb(err, null);
                 LOGGER.log('ERROR', err.stack);
                 return cb(err, null); // an error occurred;
              } else {
                  let _slackURL = _hookData.Parameter.Value;

                   let _slackMssg = _self.slackMessageBuilder(event);
                   _slackMssg.channel = _channelData.Parameter.Value;
              ...
           }

This obviously means whatever value is retrieved from SlackChannel and SlackHookURL are what would be used to try to send a message to Slack; in both cases SLACK_DUMMY would be returned by SSM.

Potential Solution

I think what needs to be done is to allow SlackChannel, SlackHookURL and their corresponding SSM keys SlackChannelKey, SlackHookURLKey to be specified as SSMParameters -> Properties. And then stored as name and value pairs in SSM.

Happy to work on a fix, if you want!

NoSuchResourceException

Hello,

We have implemented the aws-limit-monitor and its spokes in a lot of our accounts. But the Lambda LimitMonitorFunction has some problems. It is throwing NoSuchResourceException and keeps running for a looooong time. I have set the logging level to DEBUG and these logs are repeated in CloudWatch over and over.

| 2020-08-10T16:17:56.785+02:00 | START RequestId: 306fe37b-5bf2-4915-8402-dd0f07c03234 Version: $LATEST
| 2020-08-10T16:17:56.789+02:00 | 2020-08-10T14:17:56.789Z 306fe37b-5bf2-4915-8402-dd0f07c03234 INFO [DEBUG]Received event: { "version": "0", "id": "61c3ab5c-ce1e-a92b-ca1c-b3071460314e", "detail-type": "Scheduled Event", "source": "aws.events", "account": "451413662958", "time": "2020-08-10T14:17:31Z", "region": "us-east-1", "resources": [ "arn:aws:events:us-east-1:451413662958:rule/aws-limit-monitor-spoke-limitCh-LimitCheckSchedule-1393J5ZZR9EQZ" ], "detail": {} }
| 2020-08-10T16:17:59.639+02:00 | 2020-08-10T14:17:59.639Z	306fe37b-5bf2-4915-8402-dd0f07c03234	INFO	[ERROR]NoSuchResourceException: The request failed because the specified service does not exist.
| 2020-08-10T16:18:45.604+02:00 | 2020-08-10T14:18:45.603Z 306fe37b-5bf2-4915-8402-dd0f07c03234 INFO [DEBUG]Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances
| 2020-08-10T16:18:53.823+02:00 | END RequestId: 306fe37b-5bf2-4915-8402-dd0f07c03234
| 2020-08-10T16:18:53.823+02:00 | REPORT RequestId: 306fe37b-5bf2-4915-8402-dd0f07c03234 Duration: 57034.52 ms Billed Duration: 57100 ms Memory Size: 128 MB Max Memory Used: 110 MB

We are getting notifications to Slack. But this is adding unnecessary costs having the Lambda run for 57 seconds multiple times per hour.

Sorry if this is a user error from my side. But (I think) I have followed the guide correctly while setting it up.

Spoke Template has Hard-Coded References to US-East-1 Region

Team AWS / Limit Monitor Solution-

First and foremost - thanks for this helpful tool / solution. We have multiple internal partners in our enterprise who are keenly interested in this service (and more).

We struggled a bit with the deployment - and along the way, found some hard-coded references in the Spoke Template which assume that the centralized (main template) resources in the Primary account are deployed in us-east-1. At first, we simply updated these references to point to us-west-2, where we had Deployed the primary stack in a centralized account. When things didn't work, we cut bait, went back to the 'stock' templates provided in the Solution, and deployed everything in us-east-1.

Happy to submit a PR to address this hard-coded dependency - by either allowing the user to 'select' the region for the centralized / 'back-end' services -- or my changing the hard-coded references to intrinsic functions and just looking for those resources in the same region in which the spoke template is executed.

Thanks again for this Solution! We hope to use it as a foundation, incorporating additional limit monitoring not currently provided by Trusted Advisor, but feeding into the same collection / notification / archiving back-end - and will be sure to offer any enhancements developed back to the solution in the form of a PR.

Getting Multiple Slack Messages Per Day for Same LimitName

First of all, Thanks so much for this! It has been a huge help and we are super on top of our service limits now. I did want to note that we are getting the same alert up to four or more times per day in the slack channel which is strange because the cloudwatch event is set to only run once per day. Another thing is that the DynamoDB table is filling up with old readings. I see some TTL column, but is there anything that lifecycles out those old dynamodb records after a month or some other period of time? I don't see much advantage in hanging on to those historical numbers. What purpose does the ExpiryTime( TTL) column serve in the table if I may ask?
image
image

README.md couple of improvements

There seems to be a missing command and some other improvements which could be made to improve the README.md file for new developers.

1) Remove newline characters

newline characters are not needed in bash:

cd ./deployment
chmod +x ./run-unit-tests.sh  \n
./run-unit-tests.sh \n

Should either be removed:

cd ./deployment
chmod +x ./run-unit-tests.sh
./run-unit-tests.sh

Or if intending to be a single multiline command:

cd ./deployment && \
  chmod +x ./run-unit-tests.sh && \
  ./run-unit-tests.sh

2) Include missing command

It is not clear which dist folder this refers to:

aws s3 cp ./dist/ s3://my-bucket-name/limit-monitor/latest/ --recursive --exclude "*" --include "*.template" --acl bucket-owner-full-control --profile aws-cred-profile-name

Would be better to include the cd command to make it clear:

cd ../source/lambda/services/limitreport
aws s3 cp ./dist/ s3://my-bucket-name/limit-monitor/latest/ --recursive --exclude "*" --include "*.template" --acl bucket-owner-full-control --profile aws-cred-profile-name

Or if we can pass through relative paths this would be clearer:

aws s3 cp ../source/lambda/services/limitreport/dist/ s3://my-bucket-name/limit-monitor/latest/ --recursive --exclude "*" --include "*.template" --acl bucket-owner-full-control --profile aws-cred-profile-name

Not getting the notification after my sns topic reached there limit

Hi,
I setup the aws limit checker for my aws account but when our sns topic limit reached,
I don't get any email from this.
What is the issue here, I am missing something or there is some bug with the code

(ap-southeast-1)
If you need more details I can share wtih you

Lambda Servicequotas Permission Issue

New spoke deploy.

INFO [ERROR]AccessDeniedException: User: arn:aws:sts::[account id]:assumed-role/limit-monitor-limitCheckStack-GQH-LimitMonitorRole-1HLZ084R28ABB/limit-monitor-limitCheckStack-LimitMonitorFunction-ENB26B3ETQB8 is not authorized to perform: servicequotas:GetAWSDefaultServiceQuota with an explicit deny

IAM role attached to Lambda function has servicequotas:GetAWSDefaultServiceQuota for all resources so I don't understand why it's getting an explicit deny.

Add ability to mute alerts

There are situations where the limits are perfectly ok and there is no need to increase them. Example: EC2 on-demand instances

Would like to have an ack button in the Slack notification that will disable future alerts for that particular limit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.