Git Product home page Git Product logo

cdk-monitoring-constructs's Introduction

CDK Monitoring Constructs

NPM version Maven Central PyPI version NuGet version Gitpod Ready-to-Code Mergify

Easy-to-use CDK constructs for monitoring your AWS infrastructure with Amazon CloudWatch.

  • Easily add commonly-used alarms using predefined properties
  • Generate concise CloudWatch dashboards that indicate your alarms
  • Extend the library with your own extensions or custom metrics
  • Consume the library in multiple supported languages

Installation

TypeScript

https://www.npmjs.com/package/cdk-monitoring-constructs

In your package.json:

{
  "dependencies": {
    "cdk-monitoring-constructs": "^7.0.0",

    // peer dependencies of cdk-monitoring-constructs
    "aws-cdk-lib": "^2.112.0",
    "constructs": "^10.0.5"

    // ...your other dependencies...
  }
}
Java

See https://mvnrepository.com/artifact/io.github.cdklabs/cdkmonitoringconstructs

Python

See https://pypi.org/project/cdk-monitoring-constructs/

C#

See https://www.nuget.org/packages/Cdklabs.CdkMonitoringConstructs/

Features

You can browse the documentation at https://constructs.dev/packages/cdk-monitoring-constructs/

Item Monitoring Alarms Notes
AWS API Gateway (REST API) (.monitorApiGateway()) TPS, latency, errors Latency, error count/rate, low/high TPS To see metrics, you have to enable Advanced Monitoring
AWS API Gateway V2 (HTTP API) (.monitorApiGatewayV2HttpApi()) TPS, latency, errors Latency, error count/rate, low/high TPS To see route level metrics, you have to enable Advanced Monitoring
AWS AppSync (GraphQL API) (.monitorAppSyncApi()) TPS, latency, errors Latency, error count/rate, low/high TPS
Amazon Aurora (.monitorAuroraCluster()) Query duration, connections, latency, CPU usage, Serverless Database Capacity Connections, Serverless Database Capacity and CPU usage
AWS Billing (.monitorBilling()) AWS account cost Total cost (anomaly) Requires enabling the Receive Billing Alerts option in AWS Console / Billing Preferences
AWS Certificate Manager (.monitorCertificate()) Certificate expiration Days until expiration
AWS CloudFront (.monitorCloudFrontDistribution()) TPS, traffic, latency, errors Error rate, low/high TPS
AWS CloudWatch Logs (.monitorLog()) Patterns present in the log group Minimum incoming logs
AWS CloudWatch Synthetics Canary (.monitorSyntheticsCanary()) Latency, error count/rate Error count/rate, latency
AWS CodeBuild (.monitorCodeBuildProject()) Build counts (total, successful, failed), failed rate, duration Failed build count/rate, duration
AWS DocumentDB (.monitorDocumentDbCluster()) CPU, throttling, read/write latency, transactions, cursors CPU
AWS DynamoDB (.monitorDynamoTable()) Read and write capacity provisioned / used Consumed capacity, throttling, latency, errors
AWS DynamoDB Global Secondary Index (.monitorDynamoTableGlobalSecondaryIndex()) Read and write capacity, indexing progress, throttled events
AWS EC2 (.monitorEC2Instances()) CPU, disk operations, network
AWS EC2 Auto Scaling Groups (.monitorAutoScalingGroup()) Group size, instance status
AWS ECS (.monitorFargateService(), .monitorEc2Service(), .monitorSimpleFargateService(), monitorSimpleEc2Service(), .monitorQueueProcessingFargateService(), .monitorQueueProcessingEc2Service()) System resources and task health Unhealthy task count, running tasks count, CPU/memory usage, and bytes processed by load balancer (if any) Use for ecs-patterns load balanced ec2/fargate constructs (NetworkLoadBalancedEc2Service, NetworkLoadBalancedFargateService, ApplicationLoadBalancedEc2Service, ApplicationLoadBalancedFargateService)
AWS ElastiCache (.monitorElastiCacheCluster()) CPU/memory usage, evictions and connections CPU, memory, items count
AWS Glue (.monitorGlueJob()) Traffic, job status, memory/CPU usage Failed/killed task count/rate
AWS Kinesis Data Analytics (.monitorKinesisDataAnalytics) Up/Downtime, CPU/memory usage, KPU usage, checkpoint metrics, and garbage collection metrics Downtime, full restart count
AWS Kinesis Data Stream (.monitorKinesisDataStream()) Put/Get/Incoming Record/s and Throttling Throttling, throughput, iterator max age
AWS Kinesis Firehose (.monitorKinesisFirehose()) Number of records, requests, latency, throttling Throttling
AWS Lambda (.monitorLambdaFunction()) Latency, errors, iterator max age Latency, errors, throttles, iterator max age Optional Lambda Insights metrics (opt-in) support
AWS Load Balancing (.monitorNetworkLoadBalancer(), .monitorFargateApplicationLoadBalancer(), .monitorFargateNetworkLoadBalancer(), .monitorEc2ApplicationLoadBalancer(), .monitorEc2NetworkLoadBalancer()) System resources and task health Unhealthy task count, running tasks count, (for Fargate/Ec2 apps) CPU/memory usage Use for FargateService or Ec2Service backed by a NetworkLoadBalancer or ApplicationLoadBalancer
AWS OpenSearch/Elasticsearch (.monitorOpenSearchCluster(), .monitorElasticsearchCluster()) Indexing and search latency, disk/memory/CPU usage Indexing and search latency, disk/memory/CPU usage, cluster status, KMS keys
AWS RDS (.monitorRdsCluster()) Query duration, connections, latency, disk/CPU usage Connections, disk and CPU usage
AWS RDS (.monitorRdsInstance()) Query duration, connections, latency, disk/CPU usage Connections, disk and CPU usage
AWS Redshift (.monitorRedshiftCluster()) Query duration, connections, latency, disk/CPU usage Query duration, connections, disk and CPU usage
AWS S3 Bucket (.monitorS3Bucket()) Bucket size and number of objects
AWS SecretsManager (.monitorSecretsManager()) Max secret count, min secret sount, secret count change Min/max secret count or change in secret count
AWS SecretsManager Secret (.monitorSecretsManagerSecret()) Days since last rotation Days since last change or rotation
AWS SNS Topic (.monitorSnsTopic()) Message count, size, failed notifications Failed notifications, min/max published messages
AWS SQS Queue (.monitorSqsQueue(), .monitorSqsQueueWithDlq()) Message count, age, size Message count, age, DLQ incoming messages
AWS Step Functions (.monitorStepFunction(), .monitorStepFunctionActivity(), monitorStepFunctionLambdaIntegration(), .monitorStepFunctionServiceIntegration()) Execution count and breakdown per state Duration, failed, failed rate, aborted, throttled, timed out executions
AWS Web Application Firewall (.monitorWebApplicationFirewallAclV2()) Allowed/blocked requests Blocked requests count/rate
FluentBit (.monitorFluentBit()) Num of input records, Output failures & retries, Filter metrics, Storage metrics FluentBit needs proper configuration with metrics enabled: Official sample configuration. This function creates MetricFilters to publish all FluentBit metrics.
Custom metrics (.monitorCustom()) Addition of custom metrics into the dashboard (each group is a widget) Supports anomaly detection

Getting started

Create a facade

Important note: Please, do NOT import anything from the /dist/lib package. This is unsupported and might break any time.

  1. Create an instance of MonitoringFacade, which is the main entrypoint.
  2. Call methods on the facade like .monitorLambdaFunction() and chain them together to define your monitors. You can also use methods to add your own widgets, headers of various sizes, and more.

For examples of monitoring different resources, refer to the unit tests.

export interface MonitoringStackProps extends DeploymentStackProps {
  // ...
}

// This could be in the same stack as your resources, as a nested stack, or a separate stack as you see fit
export class MonitoringStack extends DeploymentStack {
  constructor(parent: App, name: string, props: MonitoringStackProps) {
    super(parent, name, props);

    const monitoring = new MonitoringFacade(this, "Monitoring", {
      // Defaults are provided for these, but they can be customized as desired
      metricFactoryDefaults: { ... },
      alarmFactoryDefaults: { ... },
      dashboardFactory: { ... },
    });

    // Monitor your resources
    monitoring
      .addLargeHeader("Storage")
      .monitorDynamoTable({ /* Monitor a DynamoDB table */ })
      .monitorDynamoTable({ /* and a different table */ })
      .monitorLambdaFunction({ /* and a Lambda function */ })
      .monitorCustom({ /* and some arbitrary metrics in CloudWatch */ })
      // ... etc.
  }
}

Customize actions

Alarms should have an action setup, otherwise they are not very useful. Currently, we support notifying an SNS topic.

const onAlarmTopic = new Topic(this, "AlarmTopic");

const monitoring = new MonitoringFacade(this, "Monitoring", {
  // ...other props
  alarmFactoryDefaults: {
    // ....other props
    action: new SnsAlarmActionStrategy({ onAlarmTopic }),
  },
});

You can override the default topic for any alarm like this:

monitoring
  .monitorSomething(something, {
    addSomeAlarm: {
      Warning: {
        // ...other props
        threshold: 42,
        actionOverride: new SnsAlarmActionStrategy({ onAlarmTopic }),
      }
    }
  });

Custom metrics

For simply adding some custom metrics, you can use .monitorCustom() and specify your own title and metric groups. Each metric group will be rendered as a single graph widget, and all widgets will be placed next to each other. All the widgets will have the same size, which is chosen based on the number of groups to maximize dashboard space usage.

Custom metric monitoring can be created for simple metrics, simple metrics with anomaly detection and search metrics. The first two also support alarming.

Below we are listing a couple of examples. Let us assume that there are three existing metric variables: m1, m2, m3. They can either be created by hand (new Metric({...})) or (preferably) by using metricFactory (that can be obtained from facade). The advantage of using the shared metricFactory is that you do not need to worry about period, etc.

// create metrics manually
const m1 = new Metric(/* ... */);
const metricFactory = monitoringFacade.createMetricFactory();

// create metrics using metric factory
const m1 = metricFactory.createMetric(/* ... */);

Example: metric with anomaly detection

In this case, only one metric is supported. Multiple metrics cannot be rendered with anomaly detection in a single widget due to a CloudWatch limitation.

monitorCustom({
  title: "Metric with anomaly detection",
  metrics: [
    {
      metric: m1,
      anomalyDetectionStandardDeviationToRender: 3
    }
  ]
})

Adding an alarm:

monitorCustom({
  title: "Metric with anomaly detection and alarm",
  metrics: [
    {
      metric: m1,
      alarmFriendlyName: "MetricWithAnomalyDetectionAlarm",
      anomalyDetectionStandardDeviationToRender: 3,
      addAlarmOnAnomaly: {
        Warning: {
          standardDeviationForAlarm: 4,
          alarmWhenAboveTheBand: true,
          alarmWhenBelowTheBand: true
        }
      }
    }
  ]
})

Example: search metrics

monitorCustom({
  title: "Metric search",
  metrics: [
    {
      searchQuery: "My.Prefix.",
      dimensionsMap: {
        FirstDimension: "FirstDimensionValue",
        // Allow any value for the given dimension (pardon the weird typing to satisfy DimensionsMap)
        SecondDimension: undefined as unknown as string
      }
      statistic: MetricStatistic.SUM,
    }
  ]
})

Search metrics do not support setting an alarm, which is a CloudWatch limitation.

Route53 Health Checks

Route53 has strict requirements as to which alarms are allowed to be referenced in Health Checks. You adjust the metric for an alarm so that it can be used in a Route53 Health Checks as follows:

monitoring
  .monitorSomething(something, {
    addSomeAlarm: {
      Warning: {
        // ...other props
        metricAdjuster: Route53HealthCheckMetricAdjuster.INSTANCE,
      }
    }
  });

This will ensure the alarm can be used on a Route53 Health Check or otherwise throw an Error indicating why the alarm can't be used. In order to easily find your Route53 Health Check alarms later on, you can apply a custom tag to them as follows:

import { CfnHealthCheck } from "aws-cdk-lib/aws-route53";

monitoring
  .monitorSomething(something, {
    addSomeAlarm: {
      Warning: {
        // ...other props
        customTags: ["route53-health-check"],
        metricAdjuster: Route53HealthCheckMetricAdjuster.INSTANCE,
      }
    }
  });

const alarms = monitoring.createdAlarmsWithTag("route53-health-check");

const healthChecks = alarms.map(({ alarm }) => {
  const id = getHealthCheckConstructId(alarm);

  return new CfnHealthCheck(scope, id, {
    healthCheckConfig: {
      // ...other props
      type: "CLOUDWATCH_METRIC",
      alarmIdentifier: {
        name: alarm.alarmName,
        region: alarm.stack.region,
      },
    },
  });
});

Custom monitoring segments

If you want even more flexibility, you can create your own segment.

This is a general procedure on how to do it:

  1. Extend the Monitoring class
  2. Override the widgets() method (and/or similar ones)
  3. Leverage the metric factory and alarm factory provided by the base class (you can create additional factories, if you will)
  4. Add all alarms to .addAlarm() so they are visible to the user and being placed on the alarm summary dashboard

Both of these monitoring base classes are dashboard segments, so you can add them to your monitoring by calling .addSegment() on the MonitoringFacade.

Modifying or omitting widgets from default dashboard segments

While the dashboard widgets defined in the library are meant to cover most use cases, they might not be what you're looking for.

To modify the widgets:

  1. Extend the appropriate Monitoring class (e.g., LambdaFunctionMonitoring for monitorLambdaFunction) and override the relevant methods (e.g., widgets):
    export class MyCustomizedLambdaFunctionMonitoring extends LambdaFunctionMonitoring {
      widgets(): IWidget[] {
        return [
          // Whatever widgets you want instead of what LambdaFunctionMonitoring has
        ];
      }
    }
  2. Use the facade's addSegment method with your custom class:
    declare const facade: MonitoringFacade;
    
    facade.addSegment(new MyCustomizedLambdaFunctionMonitoring(facade, {
      // Props for LambdaFunctionMonitoring
    }));

Custom dashboards

If you want even more flexibility, you can take complete control over dashboard generation by leveraging dynamic dashboarding features. This allows you to create an arbitrary number of dashboards while configuring each of them separately. You can do this in three simple steps:

  1. Create a dynamic dashboard factory
  2. Create IDynamicDashboardSegment implementations
  3. Add Dynamic Segments to your MonitoringFacade

Create a dynamic dashboard factory

The below code sample will generate two dashboards with the following names:

  • ExampleDashboards-HostedService
  • ExampleDashboards-Infrastructure
// create the dynamic dashboard factory.
const factory = new DynamicDashboardFactory(stack, "DynamicDashboards", {
  dashboardNamePrefix: "ExampleDashboards",
  dashboardConfigs: [
    // 'name' is the minimum required configuration
    { name: "HostedService" },
    // below is an example of additional dashboard-specific config options
    {
      name: "Infrastructure",
      range: Duration.hours(3),
      periodOverride: PeriodOverride.AUTO,
      renderingPreference: DashboardRenderingPreference.BITMAP_ONLY
    },
  ],
});

Create IDynamicDashboardSegment implementations

For each construct you want monitored, you will need to create an implementation of an IDynamicDashboardSegment. The following is a basic reference implementation as an example:

export enum DashboardTypes {
  HostedService = "HostedService",
  Infrastructure = "Infrastructure",
}

class ExampleSegment implements IDynamicDashboardSegment {
  widgetsForDashboard(name: string): IWidget[] {
    // this logic is what's responsible for allowing your dynamic segment to return
    // different widgets for different dashboards
    switch (name) {
      case DashboardTypes.HostedService:
        return [new TextWidget({ markdown: "This shows metrics for your service hosted on AWS Infrastructure" })];
      case DashboardTypes.Infrastructure:
        return [new TextWidget({ markdown: "This shows metrics for the AWS Infrastructure supporting your hosted service" })];
      default:
        throw new Error("Unexpected dashboard name!");
    }
  }
}

Add Dynamic Segments to MonitoringFacade

When you have instances of an IDynamicDashboardSegment to use, they can be added to your dashboard like this:

monitoring.addDynamicSegment(new ExampleSegment());

Now, this widget will be added to both dashboards and will show different content depending on the dashboard. Using the above example code, two dashboards will be generated with the following content:

  • Dashboard Name: "ExampleDashboards-HostedService"
    • Content: "This shows metrics for your service hosted on AWS Infrastructure"
  • Dashboard Name: "ExampleDashboards-Infrastructure"
    • Content: "This shows metrics for the AWS Infrastructure supporting your hosted service"

Cross-account cross-Region Dashboards

Facades can be configured for different regions/accounts as a whole:

new MonitoringFacade(stack, "Monitoring", {
  metricFactoryDefaults: {
    // Different region/account than what you're deploying to
    region: "us-west-2",
    account: "01234567890",
  }
});

Or at a more granular level:

monitoring
  .monitorDynamoTable({
    // Table from the same account/region
    table: Table.fromTableName(stack, "ImportedTable", "MyTableName"),
  })
  .monitorDynamoTable({
    // Table from another account/region
    table: Table.fromTableArn(
      stack,
      "XaXrImportedTable",
      "arn:aws:dynamodb:us-west-2:01234567890:table/my-other-table",
    ),
    region: "us-west-2",
    account: "01234567890",
  });

The order of precedence of the region/account values is:

  1. The individual metric factory's props (e.g. via the monitorDynamoTable props).
  2. The facade's metricFactoryDefaults props.
  3. The region/account that the stack is deployed to.

Note that while this allows for cross-account cross-Region dashboarding, cross-Region alarming is not supported by CloudWatch.

Monitoring scopes

You can monitor complete CDK construct scopes using an aspect. It will automatically discover all monitorable resources within the scope recursively and add them to your dashboard.

monitoring.monitorScope(stack, {
  // With optional configuration
  lambda: {
    props: {
      addLatencyP50Alarm: {
        Critical: { maxLatency: Duration.seconds(10) },
      },
    },
  },

  // Some resources that aren't dependent on nodes (e.g. general metrics across instances/account) may be included
  // by default, which can be explicitly disabled.
  billing: { enabled: false },
  ec2: { enabled: false },
  elasticCache: { enabled: false },
});

Contributing

See CONTRIBUTING for more information.

Security policy

See SECURITY for more information.

License

This project is licensed under the Apache-2.0 License.

cdk-monitoring-constructs's People

Contributors

aakarshansingla avatar amazon-auto avatar ansk0 avatar arturitov avatar ayush987goyal avatar ayvazj avatar bigbadtrumpet avatar cdklabs-automation avatar dependabot[bot] avatar dicee avatar djdawson3 avatar echeung-amzn avatar edisongustavo avatar glyphack avatar gnomex909 avatar hassanazharkhan avatar jeremy-zou avatar jiashuchen avatar joejemmely avatar jonathan-luo avatar ketsugi avatar laxenade avatar marcogrcr avatar mattserrano avatar miloszwatroba avatar mitchell-cook avatar mrgrain avatar rhermes62 avatar voho avatar zqumei0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdk-monitoring-constructs's Issues

AWSGlueMetrics- Change type for dimension `glue.driver.aggregate.numCompletedStages`

Version

"@amzn/monitoring" = '^6.3.0'

Steps and/or minimal code example to reproduce

  1. Using .monitorGlueJob({ jobName: 'jobName', humanReadableName: 'humanReadableName' });

  2. Metric Job Execution on detailed and summary dashboard has the dimension glue.driver.aggregate.numCompletedStages being displayed with Type='gauge' but it needs to be Type='count' as per AWS Glue metrics list here - https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html

Expected behavior

For metric Job Execution metric with dimension glue.driver.aggregate.numCompletedStages should have Type set to count instead of gauge.

Actual behavior

Currently for metric Job Execution metric with dimension glue.driver.aggregate.numCompletedStages Type is set to gauge.

Other details

No response

[step-functions] Step Functions failed execution rate alarm inaccurate

Version

1.8.2

Steps and/or minimal code example to reproduce

Using the addFailedExecutionRateAlarm option as shown below:

addFailedExecutionRateAlarm: {
  Warning: {
    maxErrorRate: 0.50,
    period: Duration.hours(12),
  },
},

Expected behavior

I expect the failure rate to somehow account for total executions, similar to the following:

{
    "metrics": [
        [ { "expression": "failures / (failures + successes + aborts + timeouts)", "label": "Failure Rate", "id": "failure_rate" } ],
        [ "AWS/States", "ExecutionsFailed", "StateMachineArn", "arn:aws:states:us-west-2:012345678901:stateMachine:MyStateMachine", { "id": "failures", "visible": false } ],
        [ ".", "ExecutionsSucceeded", ".", ".", { "id": "successes", "visible": false } ],
        [ ".", "ExecutionsAborted", ".", ".", { "id": "aborts", "visible": false } ],
        [ ".", "ExecutionsTimedOut", ".", ".", { "id": "timeouts", "visible": false } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "period": 43200,
    "annotations": {
        "horizontal": [
            {
                "label": "Failed (avg) > 0.5 for 1 datapoints within 12 hours",
                "value": 0.5
            }
        ]
    },
    "stat": "Sum"
}

Actual behavior

Currently only uses the ExecutionsFailed metric by taking the average, even though it is a count metric. The curve of the graph is similar to the expected graph, but the amplitude does not accurately represent the true failure rate.

{
    "metrics": [
        [ "AWS/States", "ExecutionsFailed", "StateMachineArn", "arn:aws:states:us-west-2:012345678901:stateMachine:MyStepFunction", { "id": "m1", "label": "Failed (avg)", "stat": "Average", "visible": true } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "period": 43200,
    "annotations": {
        "horizontal": [
            {
                "label": "Failed (avg) > 0.5 for 1 datapoints within 12 hours",
                "value": 0.5
            }
        ]
    },
}

Other details

No response

[Infra] Improve eslint/prettier setup

Feature scope

build

Describe your suggested feature

I think our Projen setup is inconsistent in how it applies eslint/prettier rules. Seems to me that there are two conflicting definitions, or something like that - for example, the lines are wrapped eagerly, which is not what I would expect. I think we have to find a way to specify the rules on single place and then make all the tools use the same config.

Found this example:

https://github.com/p6m7g8/awesome-projen/blob/main/.eslintrc.json

[docs] Document CustomMonitoring in more detail

Feature scope

docs

Describe your suggested feature

Document custom monitoring in more details.

  • provide example for each "mode" (metric, metric with alarm, metric search..)
  • include anomaly detection

[multiple] Add more ways to calculate request/error rate

The way to represent request rate should be customizable by a parameter.

Impacted constructs: API GW, API GW2, AppSync, CloudFront, Glue, Lambda, OpenSearch, Step Functions, Synthetics

Desired modes: Count, Average, per Second/Minute/Hour/Day

[core] CDK v2 support

We should migrate to aws-cdk-lib from monocdk when appropriate. This would likely result in a v1 release of this library.

[elb] Enable Monitoring and Alarms for ALB Metrics

From customer:

I have an ALB+Fargate Coral stack, and i’d like to have a CW Dashboard and Alarms for when my ALB is seeing spikes in 5xx errors. I see the monitorFargateApplicationLoadBalancer, but I don’t think it monitors ALB metrics, just the targets of the ALB

Specifically, HTTPCode_Target_5XX_Count (backend) and HTTPCode_ELB_5XX_Count (LB) would be good to monitor/alarm.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html

monitorGlueJob creates metric with scale 0-100, but value is scale 0-1

The leftYAxis is specified as a percentage from 0 to 100, but the source glue cpuUsageMetric and heapMemoryUsageMetrics are 0 to 1 values.

protected createUtilizationWidget(width: number, height: number) {

Here is where the metric is built it comes from "glue.ALL.system.cpuSystemLoad" which ranges from 0 to 1. (Similar for heap usage metric as well)
https://github.com/cdklabs/cdk-monitoring-constructs/blob/b66685e82733c966eba1555d[…]42c174e1de78a59/lib/monitoring/aws-glue/GlueJobMetricFactory.ts
AWS Glue Metrics, state scale is 0-1
https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html
I think we either need to use a math metric to multiply the glue metric values by 100 or change the range to 0 to 1 and label it something other than "%".

Packaging issues

Tracking some issues with the npm tarball:

  • TS files are included
  • Lambda assets aren't transpiled

Excluding/Adding metrics in lambda's graphs

Feature scope

Lambda

Describe your suggested feature

Provisioned Concurrency Spillover count is included in lambda's invocation graph even though it is unused by our BoxerTicketingService so we would like to remove it.

Also in the latency graph, we would like to add TM99 metric and remove the P90 and P50 metrics

The ask is the ability to remove/add metrics to lambda's graphs (specifically "Invocations" and "Latency" graphs)

[api-gateway] Allow selective metrics on latency graphs

Currently, if we use .monitorAPIGateway(), it plots P50, P90, and P99 for latency by default. Such graphs are not required for every use case. A functionality can be added to allow the user to select the metrics that they needs plotted on the graph.

This should apply for both APIG v1 and v2, any may apply to other monitoring scopes as well.

[core] Use Graviton for Lambda handlers

Feature scope

Lambda handlers

Describe your suggested feature

We currently have some Lambda handlers (secrets metrics publisher and bitmap dashboard widget renderer) that'd benefit from being migrated to Graviton since they're cheaper.

[Refactor] Mix-up of NLB and ECS APIs

Currently, I see the ECS monitoring has been combined with NLB, ALB. For ex:

  1. monitorFargateService: monitors Fargate service, loadbalancer, targetGroup
  2. monitorSimpleFargateService: monitors Fargate service
  3. monitorFargateNetworkLoadBalancer: looks similar to monitorFargateService but here the loadbalancer and target group are passed separately and not from FargateService
  4. monitorApplicationLoadBalancer: similar to monitorFargateNetworkLoadBalancer but instead of NLB, it is for ALB
  5. monitorQueueProcessingFargateService:
  6. monitorQueueProcessingEc2Service

A better refactor and re-org in my opinion should be to split into specific parts and each part responsible for monitoring its own resource. This is my initial refactor proposal, but feel free to discuss and refactor as required.

Let's just have monitorSimpleFargateService and monitorSimpleEc2Service (we can even remove simple from name), where first monitors just FargateService and second monitors just Ec2Service. Then have monitorNetworkLoadBalancer, which monitors NetworkLoadBalancer and NetworkTarget Group, monitorApplicationLoadBalancer which monitors ApplicationLoadBalancer and ApplicationTargetGroup.

Now,

  1. remove /lib/monitoring/aws-ecs-patterns module or maybe keep it and then monitorNetworkLoadBalancedFargateService internally calls monitorFargateService, monitorNetworkLoadBalancer, monitorApplicationLoadBalancedFargateService calls monitorFargateService, monitorApplicationLoadBalancer, monitorQueueProcessingFargateService calls monitorFargateService and monitorSQS. Same for Ec2Service
  2. add a module /lib/monitoring/aws-ecs, this module contains monitorFargateService, monitorEc2Service

This is my mental model. I am happy to discuss more.

[core] Better control over dashboard widgets

Dashboards are generally fairly rigid in their implementation which is good if you just want to pick up the good defaults but bad if you want to customize them for any reason.

Ideally, the API should allow for:

  • Adjusting period per-widget (which the dashboard's periodOverride allows respecting)
  • Controlling what widgets are added
  • Controlling what metrics are used
  • Controlling positioning/sizing of widgets
  • Controlling addition of annotations

This may include how widgets are added for custom monitoring, which always uses a calculated width.

[memorydb] Support for MemoryDB

Feature scope

MemoryDB

Describe your suggested feature

Hi Team,

I would like to make a new feature request to support Amazon MemoryDB in CDK monitoring constructs.

I was hoping to see metrics & alarms for predefined metrics for example: CPUUtilization, MemoryUsage, Node Status (If any node in the cluster goes bad).

Thank you,
Kevin Patel

[sqs] SqsQueueMetricFactory using incorrect metric

Version

v1.4.0

Steps and/or minimal code example to reproduce

https://github.com/cdklabs/cdk-monitoring-constructs/blame/be6c84982745015ba130b34207197a9531d1d6b5/lib/monitoring/aws-sqs/SqsQueueMetricFactory.ts#L28

this.queue.metricNumberOfMessagesSent should not be used for metricIncomingMessageCount.
this.queue.metricNumberOfMessagesReceived should be used instead.

Expected behavior

An alarm on "incoming" messages that depends on SqsQueueMetricFactory::metricIncomingMessageCount should yield an alarm on incoming message count.

Actual behavior

The resulting alarm is on sent message count

Other details

No response

[Lambda] Iterator age graph is missing from the lambda monitoring instance

Version

v0.0.43

Steps and/or minimal code example to reproduce

Hi there,

Previously when adding a threshold for an iterator age on a lambda, the annotation for the iterator age would be added to the latency graph of the lambda. Although this was wrong and did not belong with the latency graph it was still useful. Please note: This was while using the internal version of the repo.

Is it possible to add a flag or something to display this graph on the detailed dashboard?

Expected behavior

A widget with the iterator age and appropriate annotations will be displayed when a monitor is added for the iterator age on the lambda.

Actual behavior

No graphs or annotations are added.

Other details

No response

Monitoring support for AWS DocumentDB

Feature scope

DocumentDB

Describe your suggested feature

I just installed v1.10.0 and it does appear that there isn't support for monitoring AWS DocumentDB via this library? I am kindly requesting that this be considered and added, if it's not already been planned.

[core] Extend monitoring snapshot tests to include dashboard code

Feature scope

Tests

Describe your suggested feature

Just noticed we are not testing the dashboard code in the snapshot tests, just the Stack with alarms. Ideally, each test file should verify two snapshots: one for the stack (with alarms) and one for the dashboard JSON.

[apigateway] Expand Monitor API Gateway to allow a list of Methods and Endpoints

Feature scope

AWS API Gateway

Describe your suggested feature

Currently ApiGatewayMetricFactoryProps allows only a single apiMethod and apiResource. If you want to monitor several resources from a single API we will have to use several .monitorApiGateway() calls which will be creating several graphs sections. Ideally we should be able to provide a list of apiMethods and apiResources or a complex list of [ { apiMethod: string, apiResource: string }, ] and each of these can be within the same graph.

Maybe as a further enhancement, could the resource/methods be detected? allowing for a showIndividualResources: boolean to be used instead?

Monitoring Construct in Stack in Pipeline Stage

Version

1.7.0

Steps and/or minimal code example to reproduce

My issue is that the monitoring construct runs within a stage and doesn't deploy any resources.

My steps were:

  1. CDK App has a stack CdkPipeline
  2. CdkPipeline has a stage CdkPipelineStage
  3. CdkPipelineStage includes all stacks
  4. Add monitoring stack in CdkPipelineStage
export interface Props extends StackProps {
  constructToMonitor: ConstructType;
}
export class MonitoringStack extends Stack {
  constructor(parent: Construct, name: string, props: Props) {
    super(parent, name, props);

    const { constructToMonitor } = props;

    const monitoring = new MonitoringFacade(this, 'monFacade', {
      alarmFactoryDefaults: {
        actionsEnabled: true,
        alarmNamePrefix: 'alarm'
      },
      metricFactoryDefaults: {}
    });

    monitoring.monitorScope(constructToMonitor);
    monitoring.monitorBilling();
    monitoring.monitorLambdaFunction({
      lambdaFunction: constructToMonitor.sqsLambdaDlq.consumerLambda
    });
  }
}
  1. Nothing gets deployed except for CDK Metadata

The same thing happens if I am adding the monitoring construct within an existent stack with other services.

Is it necessary that I pass the App like in the example or is it also possible when I pass a construct?

Expected behavior

Find all resources within my construct and create alarms, dashboards, etc.

Actual behavior

Nothing will be deployed.

Other details

No response

[kinesis] Add monitoring support for Redshift/S3 alarms

We use AWS Firehose to publish data to Redshift and S3. (Our Firehose stream publishes to both)
We would like to have metric and dashboards for that.

Firehose Metrics: https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-metrics.html

The most important metrics:

  • DeliveryToRedshift.Success
  • DeliveryToRedshift.DataFreshness
  • DeliveryToS3.Success
  • DeliveryToS3.DataFreshness

I would like to be able to create alarms at least on the Success metric.

See a default FireHose dashboard from AWS console in attached file.

kinesis

[ec2] add filter of instance IDs

As a customer:

I would just prefer a property under .monitorEC2instances that would allow me to specify instance ID's or something along those lines. I didn't deploy the instances via an ASG, so it seems like an all or nothing approach using this construct.

[core] Support query metrics

Currently MetricsFactory has createMetricMath, createMetricSearch, createMetricAnomaly. It doesn't have createMetricQuery.

Why I need this?

I have metric with dimension: [Program, Operation, EntityType, StatusFamily, StatusCode]

Now to compute availability metrics, it will be 1- count(fault_request)/count(valid_request)

success_request: StatusFamily=SUCCESSFUL
fault_request: StatusFamily=SERVER_ERROR
error_request: StatusFamily=CLIENT_ERROR

Now, in each family there can be multiple statusCode. This can be better done using query metrics.

Add to README how to use MonitoringAspect

Feature scope

README

Describe your suggested feature

Would be great if README included the default setup on how to use MonitoringAspect as this is preferred to have add monitoring to each resource created.

[waf] Support for AWS WAF

Feature scope

AWS WAF

Describe your suggested feature

AWS Offers Several different firewall tools. Having Web Application Firewall being monitored in this library would be greatly helpful.

[dynamo] Disallow utilization metrics/alarms if a table is using on-demand capacity

Version

v1.14.0

Steps and/or minimal code example to reproduce

  1. Monitor a DDB table
  2. Note that the Utilization metrics in the dashboards show warnings
  3. Note that you can technically create alarms against utilization metrics

Expected behavior

If a DDB table is using on-demand capacity, the utilization metrics should be excluded from the dashboard widgets and alarms should not be able to be created.

Actual behavior

Dashboards include metrics and you can create (invalid) alarms.

Other details

No response

[lambda] MonitoringFacade Alarming Inaccuracy/Bad Method Name

The below code generates the attached dashboard image. IteratorAge is it's own lambda metric, independent of Duration. Is there a miss in my configuration?

monitoring.monitorLambdaFunction({
....
                addMaxIteratorAgeAlarm: {
                    Warning: {
                        maxAgeInMillis: ...
                    },
....

Screen Shot 2022-03-04 at 12 41 00 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.