prometheus / cloudwatch_exporter Goto Github PK

View Code? Open in Web Editor NEW

869.0 19.0 320.0 649 KB

Metrics exporter for Amazon AWS CloudWatch

License: Apache License 2.0

Java 99.00% Dockerfile 0.74% Makefile 0.26%

cloudwatch_exporter's Introduction

CloudWatch Exporter

A Prometheus exporter for Amazon CloudWatch.

Alternatives

For ECS workloads, there is also an ECS exporter.

For a different approach to CloudWatch metrics, with automatic discovery, consider Yet Another CloudWatch Exporter (YACE).

Building and running

Cloudwatch Exporter requires at least Java 11.

mvn package to build.

java -jar target/cloudwatch_exporter-*-SNAPSHOT-jar-with-dependencies.jar 9106 example.yml to run.

The most recent pre-built JAR can be found at http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22cloudwatch_exporter%22

Credentials and permissions

The CloudWatch Exporter uses the AWS Java SDK, which offers a variety of ways to provide credentials. This includes the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

The cloudwatch:ListMetrics, cloudwatch:GetMetricStatistics and cloudwatch:GetMetricData IAM permissions are required. The tag:GetResources IAM permission is also required to use the aws_tag_select feature.

Configuration

The configuration is in YAML.

An example with common options and aws_dimension_select:

---
region: eu-west-1
metrics:
 - aws_namespace: AWS/ELB
   aws_metric_name: RequestCount
   aws_dimensions: [AvailabilityZone, LoadBalancerName]
   aws_dimension_select:
     LoadBalancerName: [myLB]
   aws_statistics: [Sum]

A similar example with common options and aws_tag_select:

---
region: eu-west-1
metrics:
 - aws_namespace: AWS/ELB
   aws_metric_name: RequestCount
   aws_dimensions: [AvailabilityZone, LoadBalancerName]
   aws_tag_select:
     tag_selections:
       Monitoring: ["enabled"]
     resource_type_selection: "elasticloadbalancing:loadbalancer"
     resource_id_dimension: LoadBalancerName
   aws_statistics: [Sum]

Note: configuration examples for different namespaces can be found in examples directory

Note: A configuration builder can be found here.

Name	Description
region	Optional. The AWS region to connect to. If none is provided, an attempt will be made to determine the region from the default region provider chain.
role_arn	Optional. The AWS role to assume. Useful for retrieving cross account metrics.
metrics	Required. A list of CloudWatch metrics to retrieve and export
aws_namespace	Required. Namespace of the CloudWatch metric.
aws_metric_name	Required. Metric name of the CloudWatch metric.
aws_dimensions	Required. This should contain exactly all the dimensions available for a metric. Run `aws cloudwatch list-metrics` to find out which dimensions you need to include for your metric.
aws_dimension_select	Optional. Which dimension values to filter. Specify a map from the dimension name to a list of values to select from that dimension.
aws_dimension_select_regex	Optional. Which dimension values to filter on with a regular expression. Specify a map from the dimension name to a list of regexes that will be applied to select from that dimension.
aws_tag_select	Optional. A tag configuration to filter on, based on mapping from the tagged resource ID to a CloudWatch dimension.
tag_selections	Optional, under `aws_tag_select`. Specify a map from a tag key to a list of tag values to apply tag filtering on resources from which metrics will be gathered.
resource_type_selection	Required, under `aws_tag_select`. Specify the resource type to filter on. `resource_type_selection` should be comprised as `service:resource_type`, as per the resource group tagging API. Where `resource_type` could be an empty string, like in S3 case: `resource_type_selection: "s3:"`.
resource_id_dimension	Required, under `aws_tag_select`. For the current metric, specify which CloudWatch dimension maps to the ARN resource ID.
arn_resource_id_regexp	If the Cloudwatch dimension specified in `resource_id_dimension` doesn't conform to the convention for resource ID an alternative regular expression to extract the resource ID from the ARN can be given here. The default is `(?:([^:/]+)
aws_statistics	Optional. A list of statistics to retrieve, values can include Sum, SampleCount, Minimum, Maximum, Average. Defaults to all statistics unless extended statistics are requested.
aws_extended_statistics	Optional. A list of extended statistics to retrieve. Extended statistics currently include percentiles in the form `pN` or `pN.N`.
delay_seconds	Optional. The newest data to request. Used to avoid collecting data that has not fully converged. Defaults to 600s. Can be set globally and per metric.
range_seconds	Optional. How far back to request data for. Useful for cases such as Billing metrics that are only set every few hours. Defaults to 600s. Can be set globally and per metric.
period_seconds	Optional. Period to request the metric for. Only the most recent data point is used. Defaults to 60s. Can be set globally and per metric.
set_timestamp	Optional. Boolean for whether to set the Prometheus metric timestamp as the original Cloudwatch timestamp. For some metrics which are updated very infrequently (such as S3/BucketSize), Prometheus may refuse to scrape them if this is set to true (see #100). Defaults to true. Can be set globally and per metric.
use_get_metric_data	Optional. Boolean (experimental) Use GetMetricData API to get metrics instead of GetMetricStatistics. Can be set globally and per metric.
list_metrics_cache_ttl	Optional. Number of seconds to cache the result of calling the ListMetrics API. Defaults to 0 (no cache). Can be set globally and per metric.
warn_on_empty_list_dimensions	Optional. Boolean Emit warning if the exporter cannot determine what metrics to request

The above config will export time series such as

# HELP aws_elb_request_count_sum CloudWatch metric AWS/ELB RequestCount Dimensions: ["AvailabilityZone","LoadBalancerName"] Statistic: Sum Unit: Count
# TYPE aws_elb_request_count_sum gauge
aws_elb_request_count_sum{job="aws_elb",instance="",load_balancer_name="mylb",availability_zone="eu-west-1c",} 42.0
aws_elb_request_count_sum{job="aws_elb",instance="",load_balancer_name="myotherlb",availability_zone="eu-west-1c",} 7.0

If the aws_tag_select feature was used, an additional information metric will be exported for each AWS tagged resource matched by the resource type selection and tag selection (if specified), such as

# HELP aws_resource_info AWS information available for resource
# TYPE aws_resource_info gauge
aws_resource_info{job="aws_elb",instance="",arn="arn:aws:elasticloadbalancing:eu-west-1:121212121212:loadbalancer/mylb",load_balancer_name="mylb",tag_Monitoring="enabled",tag_MyOtherKey="MyOtherValue",} 1.0

aws_recource_info can be joined with other metrics using group_left in PromQL such as the following:

  aws_elb_request_count_sum
* on(load_balancer_name) group_left(tag_MyOtherKey)
  aws_resource_info

All metrics are exported as gauges.

In addition cloudwatch_exporter_scrape_error will be non-zero if an error occurred during the scrape, and cloudwatch_exporter_scrape_duration_seconds contains the duration of that scrape. cloudwatch_exporter_build_info contains labels referencing the current build version and build release date.

Build Info Metric

cloudwatch_exporter_build_info is a default cloudwatch exporter metric that contains the current cloudwatch exporter version and release date as label values. The numeric metric value is statically set to 1. If the metrics label values are "unknown" the build information scrap failed.

CloudWatch doesn't always report data

Cloudwatch reports data either always or only in some cases, example only if there is a non-zero value. The CloudWatch Exporter mirrors this behavior, so you should refer to the Cloudwatch documentation to find out if your metric is always reported or not.

Timestamps

CloudWatch has been observed to sometimes take minutes for reported values to converge. The default delay_seconds will result in data that is at least 10 minutes old being requested to mitigate this. The samples exposed will have the timestamps of the data from CloudWatch, so usual staleness semantics will not apply and values will persist for 5m for instant vectors.

In practice this means that if you evaluate an instant vector at the current time, you will not see data from CloudWatch. An expression such as aws_elb_request_count_sum offset 10m will allow you to access the data, and should be used in recording rules and alerts.

For certain metrics which update relatively rarely, such as from S3, set_timestamp should be configured to false so that they are not exposed with a timestamp. This is as the true timestamp from CloudWatch could be so old that Prometheus would reject the sample.

FAQ: I can see the metrics in `/metrics` but not in the Prometheus web console

The metrics will be visible in Prometheus if you look more than delay_seconds in the past. Try the graph view.

This is an unfortunate result of a fundamental mismatch between CloudWatch and Prometheus. CloudWatch metrics converge over time, that is, the value at time T can change up to some later time T+dT. Meanwhile, Prometheus assumes that once it has scraped a sample, that is the truth, and the past does not change.

To compensate for this, by default the exporter delays fetching metrics, that is, it only asks for data 10 minutes later, when almost all AWS services have converged. It also reports to Prometheus that this sample is from the past. Because Prometheus, for an instant request, only looks back 5 minutes, it never sees any data "now".

Special handling for certain DynamoDB metrics

The DynamoDB metrics listed below break the usual CloudWatch data model.

ConsumedReadCapacityUnits
ConsumedWriteCapacityUnits
ProvisionedReadCapacityUnits
ProvisionedWriteCapacityUnits
ReadThrottleEvents
WriteThrottleEvents

When these metrics are requested in the TableName dimension CloudWatch will return data only for the table itself, not for its Global Secondary Indexes. Retrieving data for indexes requires requesting data across both the TableName and GlobalSecondaryIndexName dimensions. This behaviour is different to that of every other CloudWatch namespace and requires that the exporter handle these metrics differently to avoid generating duplicate HELP and TYPE lines.

When exporting one of the problematic metrics for an index the exporter will use a metric name in the format aws_dynamodb_METRIC_index_STATISTIC rather than the usual aws_dynamodb_METRIC_STATISTIC. The regular naming scheme will still be used when exporting these metrics for a table, and when exporting any other DynamoDB metrics not listed above.

Reloading Configuration

There are two ways to reload configuration:

Send a SIGHUP signal to the pid: kill -HUP 1234
POST to the reload endpoint: curl -X POST localhost:9106/-/reload

If an error occurs during the reload, check the exporter's log output.

Cost

Amazon charges for every CloudWatch API request or for every Cloudwatch metric requested, see the current charges.

In case of using GetMetricStatistics (default) - Every metric retrieved requires one API request, which can include multiple statistics.
In addition, when aws_dimensions is provided, the exporter needs to do API requests to determine what metrics to request. This should be negligible compared to the requests for the metrics themselves.

In the case that all aws_dimensions are provided in the aws_dimension_select list, the exporter will not perform the above API request. It will request all possible combination of values for those dimensions. This will reduce cost as the values for the dimensions do not need to be queried anymore, assuming that all possible value combinations are present in CloudWatch.

If you have 100 API requests every minute, with the price of USD$10 per million requests (as of Aug 2018), that is around $45 per month. The cloudwatch_requests_total counter tracks how many requests are being made.

When using the aws_tag_select feature, additional requests are made to the Resource Groups Tagging API, but these are free. The tagging_api_requests_total counter tracks how many requests are being made for these.

Experimental GetMetricData

We are transitioning to use GetMetricsData instead of GetMetricsStatistics. The benefits of using GetMetricsData is mainly around much better performence.

Please refer to this doc explaining why it is best practice to use GetMetricData

API	performence	Costs	Stability
`GetMetricStatistics`	May be slow at scale	Charged per API request	stable. (Default option)
`GetMetricData`	Can retrieve data faster at scale	Charged per metric requested	New (opt-in via configuration)

Transition plan

At first this feature would be opt-in to allow you to decide when and how to test it On later versions we would swap the default so everyone can enjoy the benefits.

Cloudwatch exporter also expose a new self metric called cloudwatch_metrics_requested_total that allows you to track number of requested metrics in addition to the number of API requests.

Docker Images

To run the CloudWatch exporter on Docker, you can use the image from

The available tags are

main: snapshot updated on every push to the main branch
latest: the latest released version
vX.Y.Z: the specific version X.Y.Z. Note that up to version 0.11.0, the format was cloudwatch-exporter_X.Y.Z.

The image exposes port 9106 and expects the config in /config/config.yml. To configure it, you can bind-mount a config from your host:

docker run -p 9106 -v /path/on/host/config.yml:/config/config.yml quay.io/prometheus/cloudwatch-exporter

Specify the config as the CMD:

docker run -p 9106 -v /path/on/host/us-west-1.yml:/config/us-west-1.yml quay.io/prometheus/cloudwatch-exporter /config/us-west-1.yml

Or create a config file named config.yml along with following Dockerfile in the same directory and build it with docker build:

FROM prom/cloudwatch-exporter
ADD config.yml /config/

cloudwatch_exporter's People

Contributors

Stargazers

Watchers

Forkers

brian-brazil aravindtj discordianfish slikk66 r4k justone rikardev alars-alit draxly originalrecipe jinty javierzunzunegui craigday jfindley wehkamp tobstarr visarya pete0emerson shidel-dev linearregression sthammana cz-moteki-takahiro jianghaitao1221 anthonywc gozer moshebs yuecong atex-jenkins shencan yusha2016 vaishalisutaria lisahelm mnadel imartyn mobcrush sahilg1 button jamessoubry no3am klaus-placr alkalinecoffee lzaldivargs appop bryantrobbins dbonatto sivakumar077 practo meshuga monikaka1996 rblumen-desk mkurdziel sansible kppullin zfq308 nningego vkalladath yaacov rodrigorfk danielm223 deepsonune mikhailadvani aaronmell alaqelsplk friedrich-at-adobe subuk monzo pluralsight venkatamutyala etsangsplk magictour mjrlee oferziss mornor timerope karlhungus zegl spasam unboundev jwenz723 jimdo pirmintapken wwwkang8 jimbeck nwiizo msiuts micrub shamimgeek nishanth-pinnapareddy kiwivogel steven-aerts vkatsikaros seehait danihodovic danielwhatmuff swestcott noqcks yext redpangzi666 kaisermario moolen

cloudwatch_exporter's Issues

(Dockerfile) /config.yml breaks kubernetes!

You cannot mount a ConfigMap into a file, only a directory in kubernetes.

This means that you can only run your config for cloudwatch in a ConfigMap if you override the entrypoint!

This should be overridable by an environment variable and ideally you shouldn't force people to need a config file.

https://medium.com/@kelseyhightower/12-fractured-apps-1080c73d481c

Add optional timestamps to exposition data

I made a quick hack which added timestamps to the exposition format (I hope I use the right term). Like in the example below.

# HELP aws_elb_healthy_host_count_average CloudWatch metric AWS/ELB HealthyHostCount Dimensions: [LoadBalancerName] Statistic: Average Unit: Count
# TYPE aws_elb_healthy_host_count_average gauge
aws_elb_healthy_host_count_average{job="aws_elb",load_balancer_name="aaa",} 1.0 1455192180000
aws_elb_healthy_host_count_average{job="aws_elb",load_balancer_name="bbb",} 1.0 1455192180000
aws_elb_healthy_host_count_average{job="aws_elb",load_balancer_name="ccc",} 1.0 1455192180000

The advantage of this is that the timestamps in prometheus will match up with CloudWatch, which makes it easier if only a subset of data is exported and CloudWatch is needed for drilldown. Due to this I think this would be a really good option to be able to configure.

However, collector.MetricFamilySamples.Sample does not support the addition of timestamps, and write004 for instance does not either have support for this.

Would this be a desired feature, and in that case what would be the best way to implement it?

Can you provide sample prometheus configuration

Can you provide sample prometheus configuration
scrape_configs:
relabel_configs:

for example : https://github.com/prometheus/blackbox_exporter#prometheus-configuration

polling frequency

I don't seem to see any configuration for the polling frequency. Is this something which is actually modifiable? If not, is this the sort of pull request that would be welcome?

An issue gathering metrics from SQL queues

After configure a bunch of SQL metrics (ApproximateNumberOfMessagesVisible) for some queues, trying to recover these metrics, our Prometheus server show this message as a result of the CloudWatch exporter:

text format parsing error in line 54: second HELP line for metric name "aws_sqs_approximate_number_of_messages_visible_sum"

Here's a sample of these metrics

 53 aws_sqs_approximate_number_of_messages_visible_sum{job="aws_sqs",instance="",queue_name="provisioning-notify-queue-protesis",} 0.0
 54 # HELP aws_sqs_approximate_number_of_messages_visible_sum CloudWatch metric AWS/SQS ApproximateNumberOfMessagesVisible Dimensions: [QueueName] Statistic: Sum Unit: Count
 55 # TYPE aws_sqs_approximate_number_of_messages_visible_sum gauge

and here's a snippet of the metric configuration file:

 - aws_namespace: AWS/SQS
  aws_metric_name: ApproximateNumberOfMessagesVisible
  aws_dimensions:
  - QueueName
  aws_dimension_select:
    SQSName:
    - survela-provisioning-notify
    - survela-provisioning-check
  aws_statistics:
  - Sum
- aws_namespace: AWS/SQS
  aws_metric_name: ApproximateNumberOfMessagesVisible
  aws_dimensions:
  - QueueName
  aws_dimension_select:
    SQSName:
    - provisioning-notify-queue-pro
    - provisioning-check-queue-pro
  aws_statistics:
  - Sum

any help would be fantastic, thanks in advance

Is aws_dimensions really optional?

The README suggests that the aws_dimensions value is optional, but all examples include at least one dimension, and in the case of AWS/ApplicationELB, I received no metrics until a value of "LoadBalancer" was included.

If the value is "optional" in the sense that it is not required for every namespace, can this be clarified in the docs?

Cheers,
...Bryan

AWS ALB

Seems the target group dimension is not working?

aws_namespace: AWS/ApplicationELB
aws_metric_name: TargetResponseTime
aws_dimensions: [TargetGroup]
aws_dimension_select:
TargetGroup: [targetgroup/example/example-arn]
aws_statistics: [Average]

Getting metrics for dead ELBs might cost a lot of money

This might be also relative to other resources, but I felt the pain with ELBs.

Consider the following configuration:

aws_namespace: AWS/ELB
aws_metric_name: UnHealthyHostCount
aws_dimensions: [AvailabilityZone, LoadBalancerName]
aws_dimension_select_regex:
LoadBalancerName: [myELB-.*]

What happens in CloudWatchCollector is that ELBs that do not exist at the moment of scraping are also returned as part of the available metrics in the results of getDimensions(). Then the scrape() method calls an API per ELB and rules out the resources that have too old data points. You pay fr each API you call, so you just paid for an API that brought you metrics for an ELB that is dead already.

Imagine a development environment, where ELBs are constantly created and destroyed. The list gets longer with time and the costs go higher.

I handled this issue by forking the project and using the ELB API to filter out dead ELBs before the metrics API is called, and I will be happy to create a merge request for that, but the solution is specific to ELBs, and I guess the problem might be relevant to other resource types as well.

Thanks.

CloudWatch scrape failed cannot be cast to java.util.List

Dec 22, 2016 8:39:36 PM io.prometheus.cloudwatch.CloudWatchCollector collect
WARNING: CloudWatch scrape failed
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
at io.prometheus.cloudwatch.CloudWatchCollector.metricIsInAwsDimensionSelectRegex(CloudWatchCollector.java:240)
at io.prometheus.cloudwatch.CloudWatchCollector.useMetric(CloudWatchCollector.java:207)
at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:187)
at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:312)
at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:377)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:73)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.(CollectorRegistry.java:65)
at io.prometheus.client.CollectorRegistry.metricFamilySamples(CollectorRegistry.java:56)
at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:41)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)

ES metrics not appearing

Hi - I am using the following config point, but see nothing for the ES metrics appearing when calling the metrics URL.

- aws_namespace: AWS/ES
      aws_metric_name: Nodes
      aws_dimensions: [ClientId, DomainName]
      aws_dimension_select:
        ClientId: [ 123456789 ]
        DomainName: [ myEsDomain ]
      aws_statistics: [ SampleCount ]
      period_seconds: 3600

I can query the metrics via aws cli, but using the same detail in the config still doesn't net anything.

Am I missing something or is this possibly a AWS issue? I am currently successfully pulling EC2 and RDS metrics.

failed working in AWS china region

2016-03-28 09:22:09.887:INFO:oejs.Server:jetty-8.y.z-SNAPSHOT
2016-03-28 09:22:09.935:INFO:oejs.AbstractConnector:Started [email protected]:9106
Mar 28, 2016 9:22:14 AM com.amazonaws.http.AmazonHttpClient executeHelper
INFO: Unable to execute HTTP request: monitoring.cn-north-1.amazonaws.com: Name or service not known
java.net.UnknownHostException: monitoring.cn-north-1.amazonaws.com: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
    at java.net.InetAddress.getAllByName(InetAddress.java:1185)
    at java.net.InetAddress.getAllByName(InetAddress.java:1119)
    at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
    at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:259)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:159)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:769)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:506)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:318)
    at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:886)
    at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:665)
    at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:161)
    at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:294)
    at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:359)
    at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:73)
    at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.<init>(CollectorRegistry.java:65)
    at io.prometheus.client.CollectorRegistry.metricFamilySamples(CollectorRegistry.java:56)
    at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:41)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:365)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
    at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

Export data to use in a graph

Not an issue, but it's more a question.
Currently when a specific event occurs in my application, a metric is recorded in CloudWatch with the value of 1.
I have thousands of events a day. And I want to display graph data about how many times this event occurs per hour (or per day) in Prometheus.
I thought I could use this tool for that, but I only see a single value per dimension (as stated in the docs: a gauge). Am I doing something wrong?

Bad idea to export label job

Due to how relabeling works, job is not available during relabeling, so we as users cannot do anything with the label. It cannot be mutated or copied into another label. It simply appears as label exported_job into Prometheus master.

Suggest changing the name of the exported job label to "namespace" which is both more appropriate and does not conflict at all with the built in autolabeling of metrics.

Thanks.

Read timeout

Hi Team

I trued running the exporter and got the below error

Sep 05, 2017 12:30:12 PM io.prometheus.cloudwatch.CloudWatchCollector collect
WARNING: CloudWatch scrape failed
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[560,40]
Message: Read timed out). Response Code: 200, Response Text: OK
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
at java.util.Collections.list(Collections.java:5240)
at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:748)

Any help please...

trustAnchors parameter must be non-empty

Hi, I've done what the readme has said. I am supplying the credentials and calling the jar correctly however this error seems to be causing an issue and there is very little about it. Is this something to do with the java version I am using?

Any help in understanding what is going on and what I may need to do to fix it is appreciated

openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Jan 12, 2017 4:57:31 PM io.prometheus.cloudwatch.CloudWatchCollector collect
WARNING: CloudWatch scrape failed
com.amazonaws.SdkClientException: Unable to execute HTTP request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-emp
ty
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:970)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:675)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:632)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:600)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:582)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:446)
        at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:931)
        at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:907)
        at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:652)
        at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:179)
        at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:312)
        at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:377)
        at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:73)
        at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:88)
        at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:58)
        at java.util.Collections.list(Collections.java:5240)
        at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:17)
        at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:41)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:365)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)

Dynamic config reloading

Any plans for this app to support reloading the config file during runtime, similar to prometheus' curl -X POST http://127.0.0.1:443/-/reload endpoint?

"AWS/Billing" metrics do not appear

Hi,

first of all thanks for your work! While testing the exporter, it turned out, that I was not able to fetch estimated charges (using a valid iam role). cloudwatch_exporter does not show any error in logs or scrape_error counter.

Here is a sample config:


---
region: us-east-1
metrics:
 - aws_namespace: AWS/Billing
   aws_metric_name: EstimatedCharges
   aws_dimensions: [ServiceName, LinkedAccount, Currency]
   aws_dimension_select:
     Currency: [USD]
   aws_statistics: [Sum]

[k8s] mounting into "/" is prohibited and cannot find "java"

Hi,

I tried to put cloudwatch_exporter with k8s at AWS. I hit this error:

$ kubectl logs cloudwatch-556263178-eqd6t 
Timestamp: 2016-08-17 01:03:19.851546984 +0000 UTC
Code: System error

Message: mounting into / is prohibited

I share my configmap, deployment, and service as:

$ cat ./cloudwatch-configmap.yaml | nc termbin.com 9999
http://termbin.com/v80k
$ cat ./cloudwatch-deployment.yaml.share | nc termbin.com 9999
http://termbin.com/ndop
$ cat ./cloudwatch-service.yaml | nc termbin.com 9999
http://termbin.com/6rym

Also, after modifying / recompiling prom/cloudwatch-exporter Dockerfile and re-deploy into AWS/k8s, it keeps complaining cannot find "java". I suspect the default Dockerfile was written couple days back during openjdk still on 1.7. Now it's 1.8 but alternatives does not get proper set up. This is my fix:

$ cat Dockerfile | nc termbin.com 9999
http://termbin.com/856g

BTW, the base image for my k8s is debian jessie.

AWS/S3 metrics do not work

With a barebones config file:


---
region: us-east-1
metrics:
- aws_namespace: AWS/S3
  aws_metric_name: BucketSizeBytes

I get no results. In fact, it doesn't even claim it's talking to AWS at all, and the request counter stays at zero.

curl -s 0:29992/metrics
# HELP cloudwatch_requests_total API requests made to CloudWatch
# TYPE cloudwatch_requests_total counter
cloudwatch_requests_total 0.0
# HELP cloudwatch_exporter_scrape_duration_seconds Time this CloudWatch scrape took, in seconds.
# TYPE cloudwatch_exporter_scrape_duration_seconds gauge
cloudwatch_exporter_scrape_duration_seconds 0.588527659
# HELP cloudwatch_exporter_scrape_error Non-zero if this scrape failed.
# TYPE cloudwatch_exporter_scrape_error gauge
cloudwatch_exporter_scrape_error 0.0

If I come back in 5 or ten seconds, cloudwatch_requests_total will increment by 1. But no errors, and no data.
I have fiddled about with various settings for dimensions, but never does it report any datas.

I can get this working fine for AWS/ELB resources, so I know I can talk to AWS.

Unable to use Docker image behind corporate proxy

I have trying to run the Docker image behind a corporate proxy. How can i pass in proxy username and proxy port to the docker image? I tried setting HTTP_PROXY and http_proxy environment variables, but they do not seem to work.
Thus, the container is not able to talk to the AWS api.

Document aws_dimension_select_regex in README

I was looking for a way to export CloudWatch metrics for a subset of our DynamoDB tables to Prometheus. I had almost given up on this official exporter because it seemed like it could not do that, until I found out about aws_dimension_select_regex by reading the source code.

This is how I now fetch CloudWatch metrics for our production DynamoDB tables:

{
  "region": "us-east-1",
  "metrics": [
    {
      "aws_namespace": "AWS/DynamoDB",
      "aws_metric_name": "ProvisionedReadCapacityUnits",
      "aws_dimensions": [
        "TableName"
      ],
      "aws_dimension_select_regex": {
        "TableName": [ "(.*)_production" ]
      }
    }
  ]
}

It would be sweet if aws_dimension_select_regex was mentioned in the README.

Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

I have a config like this:

{
  "region": "eu-west-1",
  "metrics": [
    {"aws_namespace": "AWS/Kinesis", "aws_metric_name": "IncomingRecords",
     "aws_dimensions": ["StreamName"], "aws_statistics": ["Sum"],
     "period_seconds": 300}
  ]
}

When starting the exporter it results in:

Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
2015-02-24 14:56:05,359 DEBG 'cloudwatchexporter' stderr output:

    at io.prometheus.cloudwatch.CloudWatchCollector.<init>(CloudWatchCollector.java:110)
    at io.prometheus.cloudwatch.CloudWatchCollector.<init>(CloudWatchCollector.java:50)
    at io.prometheus.cloudwatch.WebServer.main(WebServer.java:15)

No tags in docker hub repo

I've noticed that the only existing tag in the docker hub repo is latest.
Can this be changed going forward to have proper tags that follow deployment.

In this case I downloaded the image, tagged it (as 0.5-SNAPSHOT) and pushed it to our own repo, if you create tags we'd be happy to use this tool directly from docker hub

Configured metrics not being scraped

I'm using a configuration that looks like the attached exhibit 1 below, and get back the metrics in exhibit 2. The lambda and dynamo metrics are missing. I would expect that the logs would show why the metrics are not being gathered, but there is no further information.

Why is this config breaking?

What part of the code would you add logging to, so that it's easier to fix broken configs in the future?
I'd happily submit a PR, but have no idea where to start.

Exhibit 1:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloudwatch-config
data:
  config.yml: |
    region: us-west-2
    metrics:
    - aws_namespace: AWS/Lambda
      aws_metric_name: Errors
      aws_dimensions: [FunctionName, Resource, Version, Alias]
      aws_statistics: [Count]

    - aws_namespace: AWS/Lambda
      aws_metric_name: "Dead Letter Error"
      aws_dimensions: [FunctionName, Resource, Version, Alias]
      aws_statistics: [Count]

    - aws_namespace: AWS/Lambda
      aws_metric_name: Throttles
      aws_dimensions: [FunctionName, Resource, Version, Alias]
      aws_statistics: [Count]

    - aws_namespace: AWS/Lambda
      aws_metric_name: Duration
      aws_dimensions: [FunctionName, Resource, Version, Alias]
      aws_statistics: [Milliseconds]

    - aws_namespace: AWS/ES
      aws_metric_name: FreeStorageSpace
      aws_dimensions: [DomainName, ClientId]
      aws_statistics: [Minimum]

    - aws_namespace: AWS/ES
      aws_metric_name: SearchableDocuments
      aws_dimensions: [DomainName, ClientId]
      aws_statistics: [Sum, Minimum, Maximum]

    - aws_namespace: AWS/ES
      aws_metric_name: CPUUtilization
      aws_dimensions: [DomainName, ClientId]
      aws_statistics: [Maximum, Average]

    - aws_namespace: AWS/ES
      aws_metric_name: "ClusterStatus.yellow"
      aws_dimensions: [DomainName, ClientId]
      aws_statistics: [Minimum, Maximum]

    - aws_namespace: AWS/ES
      aws_metric_name: "ClusterStatus.red"
      aws_dimensions: [DomainName, ClientId]
      aws_statistics: [Minimum, Maximum]

    - aws_namespace: AWS/DynamoDB
      aws_metric_name: ReadThrottleEvents
      aws_dimensions: [TableName, GlobalSecondaryIndexName]
      aws_statistics: [Sum, SampleCount]

Exhibit 2:

 HELP cloudwatch_requests_total API requests made to CloudWatch
# TYPE cloudwatch_requests_total counter
cloudwatch_requests_total 81855.0
# HELP aws_es_free_storage_space_minimum CloudWatch metric AWS/ES FreeStorageSpace Dimensions: [DomainName, ClientId] Statistic: Minimum Unit: Megabytes
# TYPE aws_es_free_storage_space_minimum gauge
aws_es_free_storage_space_minimum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 212726.246
# HELP aws_es_searchable_documents_sum CloudWatch metric AWS/ES SearchableDocuments Dimensions: [DomainName, ClientId] Statistic: Sum Unit: Count
# TYPE aws_es_searchable_documents_sum gauge
aws_es_searchable_documents_sum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 1.2435096795E10
# HELP aws_es_searchable_documents_minimum CloudWatch metric AWS/ES SearchableDocuments Dimensions: [DomainName, ClientId] Statistic: Minimum Unit: Count
# TYPE aws_es_searchable_documents_minimum gauge
aws_es_searchable_documents_minimum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 1.243503844E9
# HELP aws_es_searchable_documents_maximum CloudWatch metric AWS/ES SearchableDocuments Dimensions: [DomainName, ClientId] Statistic: Maximum Unit: Count
# TYPE aws_es_searchable_documents_maximum gauge
aws_es_searchable_documents_maximum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 1.24351936E9
# HELP aws_es_cpuutilization_maximum CloudWatch metric AWS/ES CPUUtilization Dimensions: [DomainName, ClientId] Statistic: Maximum Unit: Percent
# TYPE aws_es_cpuutilization_maximum gauge
aws_es_cpuutilization_maximum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 6.0
# HELP aws_es_cpuutilization_average CloudWatch metric AWS/ES CPUUtilization Dimensions: [DomainName, ClientId] Statistic: Average Unit: Percent
# TYPE aws_es_cpuutilization_average gauge
aws_es_cpuutilization_average{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 4.714285714285714
# HELP aws_es_cluster_status_yellow_minimum CloudWatch metric AWS/ES ClusterStatus.yellow Dimensions: [DomainName, ClientId] Statistic: Minimum Unit: Count
# TYPE aws_es_cluster_status_yellow_minimum gauge
aws_es_cluster_status_yellow_minimum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 0.0
# HELP aws_es_cluster_status_yellow_maximum CloudWatch metric AWS/ES ClusterStatus.yellow Dimensions: [DomainName, ClientId] Statistic: Maximum Unit: Count
# TYPE aws_es_cluster_status_yellow_maximum gauge
aws_es_cluster_status_yellow_maximum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 0.0
# HELP aws_es_cluster_status_red_minimum CloudWatch metric AWS/ES ClusterStatus.red Dimensions: [DomainName, ClientId] Statistic: Minimum Unit: Count
# TYPE aws_es_cluster_status_red_minimum gauge
aws_es_cluster_status_red_minimum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 0.0
# HELP aws_es_cluster_status_red_maximum CloudWatch metric AWS/ES ClusterStatus.red Dimensions: [DomainName, ClientId] Statistic: Maximum Unit: Count
# TYPE aws_es_cluster_status_red_maximum gauge
aws_es_cluster_status_red_maximum{job="aws_es",instance="",domain_name="es-hp-id-dev-us-west-2",client_id="871386769552",} 0.0
# HELP cloudwatch_exporter_scrape_duration_seconds Time this CloudWatch scrape took, in seconds.
# TYPE cloudwatch_exporter_scrape_duration_seconds gauge
cloudwatch_exporter_scrape_duration_seconds 0.382327836
# HELP cloudwatch_exporter_scrape_error Non-zero if this scrape failed.
# TYPE cloudwatch_exporter_scrape_error gauge
cloudwatch_exporter_scrape_error 0.0

Export empty metrics as 0

Thanks for this exporter, it's really great to consolidate all metrics in one Prometheus instance!

I'm having an issue while setting up alerts, though: if there are no 5XX errors, that metrics isn't exported at all.

I was trying to calculate the error percentage by doing something like aws_elb_httpcode_backend_5_xx_sum / aws_elb_request_count_sum, but as the metric isn't exported, i don't get any data - not even 0.

Is this something that can be fixed? Thanks!

ELB/ALB metrics not available

The current version doesn't support the new Application Load Balancer metrics. We migrated our Load Balancers from the Classic version to the new ALB and all of our metrics disappeared.

I used the following config which is similar to the example in the Readme.

region: us-east-1
metrics:

aws_namespace: AWS/ELB
aws_metric_name: RequestCount
aws_dimensions: [AvailabilityZone, LoadBalancerName]

I do have data in the Cloudwatch dashboard.

Adding the new TargetGroup dimension results in no data being returned even Classic Load Balancer metrics.

Scraping stops working from time to time

We have deployed the last version of the cloudwatch-exporter. We noticed that it stops getting logs from AWS sometimes and never recovers, having to restart it to fix it. What could be the cause? Maybe the size of the response? I included some logs below:

WARNING: CloudWatch scrape failed
Message: Read timed out). Response Code: 200, Response Text: OK
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
	at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
	at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
	at java.util.Collections.list(Collections.java:3688)
	at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:365)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
	at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[1039,14]
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:30)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
	... 41 more
Sep 24, 2017 1:45:24 AM io.prometheus.cloudwatch.CloudWatchCollector collect
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
	at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
	at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1039,14]
Message: Read timed out
	at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:275)
	at com.amazonaws.services.cloudwatch.model.transform.DimensionStaxUnmarshaller.unmarshall(DimensionStaxUnmarshaller.java:40)
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:54)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)

Thanks!

Scrape duration

Hi,

I've got an issue with scraping for Network metric of the Instances. A scrape takes a very long time, 18 seconds or more. Scraping other metrics, like cpu utilization or scraping the same metric but for an AutoScalingGroupName works fine.
Also I cannot get metrics for an individual instance, as if the select filter doesn't work. It only gives me all instances, even if the select filter is applied for an individual instance.

I've attached a file with the configuration of the scraper and the output.

Any ideas?
Thanks
Stef

scraping.txt

What is the cpu / memory / size footprint of this program ?

Just wondering, because its java backed, what's the memory use and Docker image size for this program ?

Rule help text override is ignored

According to the code it's possible to add a help tag to a JSON metrics object, which – I guess – should then be used in the metrics output for Prometheus. For example:

{
  "metrics": [
    {
      "help": "test",
      "aws_namespace": "AWS/DynamoDB",
      "aws_metric_name": "ProvisionedReadCapacityUnits",
      "aws_dimensions": [ ... ],
      "aws_dimension_select_regex": { ... }
    }
  ]
}

However, when compiling the final help text for Prometheus, this override is ignored. Instead of returning rule.help, a generated help string is always returned.

Export empty instance label

Assuming one isn't already set, we should export and empty instance label. This means that when the cloudwatch exporter moves machines, the time series names won't change.

Use semver

all other repos seem to use semver, except this one...

Expose period_seconds as a metric

I am currently exporting some DynamoDB metrics with the exporter:

     - aws_namespace: AWS/DynamoDB
       aws_metric_name: ConsumedReadCapacityUnits
       aws_dimensions: [Tablename]
       aws_statistics: [Sum]
     - aws_namespace: AWS/DynamoDB
       aws_metric_name: ConsumedWriteCapacityUnits
       aws_dimensions: [TableName]
       aws_statistics: [Sum]

In my charts i actually want to display the average amount of consumed capacity. Unfortunately just pulling the average value for these metrics from cloudwatch is not really helpful, as this gives incorrect data.
In my charts i now have to divide the Sum value by the period_seconds value (600 for my case). As this might change, it would be very helpful to have the period_seconds exposed as a metric to use it in the grafana dashboards.

Add a health check endpoint

As a user of the CloudWatch exporter I want to be able to deploy the exporter on ECS behind an ELB without burning money at huge scale

Current situation

The ELB needs an health check endpoint which responds with HTTP200
The only exposed path in the exporter is /metrics
Every request against /metrics does CloudWatch calls
Health checks (every 10s N CloudWatch requests) are expensive

Expected situation

The ELB needs an health check endpoint which responds with HTTP200
There is an exposed path /status which responds with HTTP200 reporting everything is fine (at least the Java application is available to respond)
Health check requests do not burn money
(I don't need to explain the huge AWS bill to my boss 😄 )

Not getting EC2 data

I have setup the cloudwatch exporter using the default template and I can get ELB & Elasticache data but I don't get any EC2 data when I add the following:

- aws_namespace: AWS/EC2
  aws_metric_name: CPUUtilization
  aws_dimensions: [InstanceId]

I do get data if I run: aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization

So what I am doing wrong?

Also beside the example.yml, is there a template that cover most/all of the other name space?

[Question] Exporting SQS metric

---
region: ap-south-1
metrics:
- aws_namespace: AWS/SQS
  aws_metric_name: ApproximateNumberOfMessagesVisible
  aws_dimensions: [QueueName]
  aws_statistics: [Average]

Deployed prometheus cloudwatch exporter in kubernetes using above config. I am trying to look for aws_sqs_approximate_number_of_messages_visible_average in the prometheus UI but it does not show there.
The logs shows no errors and IAM permission are correctly configured. I am able to access metrics from aws cli

Please suggest how to debug the issue.

Can't specify region for a metric

Would be nice if it was possible, as I am trying to get at AWS/Billing metrics from a region different from us-east-1

I can easily run 2 cloudwatch_exporters, but wouldn't it be nice to be able to just do:

region: us-west-2
metrics:
  # ELB Metrics
- aws_namespace: AWS/ELB
  aws_metric_name: HealthyHostCount
  aws_dimensions: [LoadBalancerName]
  aws_statistics: [Average,Maximum]

  [...]

   # Billing is strange, only exists in us-east-1
 - aws_namespace: AWS/Billing
   region: us-east-1
   range_seconds: 21600
   aws_metric_name: EstimatedCharges
   aws_dimensions: [ServiceName,Currency]
   aws_dimension_select:
     Currency: [USD]

AWS/ELB metrics do not work

why I can't get metric HTTPCode_Backend_2XX, HTTPCode_Backend_4XX and HTTPCode_Backend_5XX data, thanks.

my config file:

---
region: us-west-1
metrics:
- aws_namespace: AWS/ELB
  aws_metric_name: HealthyHostCount
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Average]

- aws_namespace: AWS/ELB
  aws_metric_name: UnHealthyHostCount
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Average]

- aws_namespace: AWS/ELB
  aws_metric_name: RequestCount
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Sum]

- aws_namespace: AWS/ELB
  aws_metric_name: Latency
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Average]

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_Backend_2XX
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Sum]

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_Backend_4XX
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Sum]

- aws_namespace: AWS/ELB
  aws_metric_name: HTTPCode_Backend_5XX
  aws_dimensions: [AvailabilityZone, LoadBalancerName]
  aws_statistics: [Sum]

Lots of requests for simple metrics

We get these:

---
region: ap-southeast-1
metrics:
 - aws_namespace: AWS/ECS
   aws_metric_name: CPUUtilization
   aws_dimensions: [ClusterName, ServiceName]
   aws_statistics: [SampleCount, Sum, Minimum, Maximum, Average]
 - aws_namespace: AWS/ECS
   aws_metric_name: MemoryUtilization
   aws_dimensions: [ClusterName, ServiceName]
   aws_statistics: [SampleCount, Sum, Minimum, Maximum, Average]
 - aws_namespace: AWS/ECS
   aws_metric_name: CPUReservation
   aws_dimensions: [ClusterName]
   aws_statistics: [Average]
 - aws_namespace: AWS/ECS
   aws_metric_name: MemoryReservation
   aws_dimensions: [ClusterName]
   aws_statistics: [Average]

This causes 58 requests upon every scrape.

Is there no way these can be batched?

Deploying to kubernetes

Docs/yamls needed for easy kubernetes deployment

S3 metrics not being pulled

I've been using the docker container for a while and I thought everything was working correctly. I just noticed I'm not getting S3 metrics. I see data for EC2, ELB, Elasticache, RDS, etc. If I go into the cloudwatch console on AWS I can see S3 metrics. It just looks like the S3 numbers are not being pulled in.

Here is my test cloudwatch.yml file:

region: us-east-1
metrics:

aws_namespace: AWS/S3
aws_metric_name: BucketSizeBytes
aws_dimensions: [BucketName]
aws_namespace: AWS/S3
aws_metric_name: NumberOfObjects
aws_dimensions: [BucketName]

Support filtering by tags

The only way to filter metrics right now is by using aws_dimension_select(_regex), which requires the user to have control over the name of those dimensions and encode the necessary information in it to assign a cloudwatch_exporter.
I'd argue, it's common to have multiple stacks or environment in one AWS account which makes filtering by the existing dimensions cumbersome. More over, if CloudFormation is used, you can't even name some things like ElastiCache (or lose the possibility to update the cluster).

If the cloudwatch_exporter could use tags for filtering, a user can just provide a tag to define which metrics should be scraped by which exporter. Since the CloudWatch API itself doesn't support filtering by tags, the exporter would have to, depending on the metric source ("aws_namespace"). I'll use AWS/ELB as an example:

Get all ELBs in a region
Run describe-tags on the ELBs to find the one with the provided tag(s)
Use the ELB's name as a value for the LoadBalancerName dimension filter

add a CMD instruction for the dockerfile like the `prom/prometheus` or `prom/alertmanger` containers

Outdated dependency in pom.xml

snakeyaml 1.17 appears to no longer be available, on 1.18 now.
https://oss.sonatype.org/content/repositories/snapshots/org/yaml/snakeyaml/

Custom Metrics not being exported

I´ve configured a Custom Cloudwatch Metric like this:
metrics:
``

aws_namespace: Custom/PageStats
aws_metric_name: Page_Generation_Time
aws_statistics: [Average]
``

And its not being exported.

Is this supported?

Scraping fails when trying to scrape a certain metric from more then one target of the same type

I'm trying to use the cloudwatch exporter in order to monitor several ec2 instances that are divided to 2 logic groups.
The cloudwatch configuration looks something like this:

...
...
# EC2 group 1
- aws_namespace: AWS/EC2
  aws_metric_name: CPUUtilization
  aws_dimensions: [InstanceId]
  aws_dimension_select:
   InstanceId: [i-X, i-Y, i-Z]
...
...
# EC2 group 2
- aws_namespace: AWS/EC2
  aws_metric_name: CPUUtilization
  aws_dimensions: [InstanceId]
  aws_dimension_select:
   InstanceId: [i-A, i-B, i-C]

After starting the container and looking at the prometheus I get the following error:

format parsing error in line xxx: second HELP line for metric name "aws_ec2_cpuutilization_sum"

and indeed the scraped data has 2 HELP lines per metric.

Shouldn't the exporter merge the 2 metrics to same area in the scraped data and have a single help line for the block?

Thanks!

Multiple same HELP/TYPE lines for DynamoDB metrics

We're pulling DynamoDB metrics from CloudWatch, and we're interested in individual table metrics (e.g. ConsumedReadCapacity for a table foo_production), and the global secondary index metrics on a table (e.g. ConsumedReadCapacity for global secondary index bar on the foobar_production table).

Our configuration for cloudwatch-exporter looks like this:

- aws_namespace: "AWS/DynamoDB"
  aws_metric_name: "ProvisionedReadCapacityUnits"
  aws_dimensions:
    - TableName
  aws_dimension_select_regex:
    TableName:
      - "(.*)_production"

- aws_namespace: "AWS/DynamoDB"
  aws_metric_name: "ProvisionedReadCapacityUnits"
  aws_dimensions:
    - TableName
    - GlobalSecondaryIndexName
  aws_dimension_select_regex:
    TableName:
      - "(.*)_production"

However, this leads to two occurrences of the same HELP and TYPE lines for these metrics:

# HELP aws_dynamodb_provisioned_read_capacity_units_sum CloudWatch metric AWS/DynamoDB ProvisionedReadCapacityUnits Dimensions: [TableName] Statistic: Sum Unit: Count
# TYPE aws_dynamodb_provisioned_read_capacity_units_sum gauge
...
# HELP aws_dynamodb_provisioned_read_capacity_units_sum CloudWatch metric AWS/DynamoDB ProvisionedReadCapacityUnits Dimensions: [TableName, GlobalSecondaryIndexName] Statistic: Sum Unit: Count
# TYPE aws_dynamodb_provisioned_read_capacity_units_sum gauge

Prometheus doesn't like this: text format parsing error in line 111: second HELP line for metric name "aws_dynamodb_provisioned_read_capacity_units_sum"

I was hoping #11 would fix this, but of course even with a custom HELP text, the TYPE will still be duplicated, and thus be invalid.

Is there a way to get around this, and have both table and global secondary index data in Prometheus?

Docker images don't have tags

It looks like prom/cloudwatch-exporter in Hub is set to autobuild from master on this repository, and to tag the image as latest: https://hub.docker.com/r/prom/cloudwatch-exporter/builds/

If latest is the only tag available, there's no way to guarantee that two containers using the "latest" tag are actually running the same code. This is also a problem if you build your own container using "FROM prom/cloudwatch-exporter" - no way to guarantee you're starting from the same version.

It would be great if you could set up Hub to also autobuild based on repo tag (this is an option in the Type dropdown on the Build Settings tab in Hub) and then add version tags to this repo (v0.1, v0.2).

This isn't an official Docker request, just asking because we're using this container and it would make deploying it more deterministic. Thanks!

Unnecessarily hard to override config file location.

The Dockerfile specifies the path to the config file as part of the entrypoint:

ENTRYPOINT [ "java", "-jar", "/cloudwatch_exporter.jar", "9106", "/config.yml" ]

I run kubernetes and would like to mount the config as a volume; however, I can't mount the volume on the root as that would make the rest of the file system unreadable. I could override the entrypoint in the location, but then I'd have to specify the entire entrypoint command.

Suggestion 1: Use /etc/cloudwatch-exporter as the default path, analogous with other prometheus binaries.
Suggestion 2: Move /config.yml to a CMD statement instead, so it can more easily be overridden in kubernetes 'args' config, ie:

ENTRYPOINT [ "java", "-jar", "/cloudwatch_exporter.jar", "9106" ]
CMD [ "/config.yml" ]

Cannot connect via proxy

Hi,

I'm trying to use this exporter but cannot connect via the proxy. I have set the environment variables http_proxy/https_proxy and supplied the proxy as arguments to java but it still doesn't work:

Mar 21, 2016 1:25:58 PM com.amazonaws.http.AmazonHttpClient executeHelper
INFO: Unable to execute HTTP request: connect timed out
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
..
Mar 21, 2016 1:25:58 PM io.prometheus.cloudwatch.CloudWatchCollector collect
WARNING: CloudWatch scrape failed
com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
..

Starting with:
$ java -Dhttp.useProxy=true -Dhttps.useProxy=true -Dhttps.proxyHost=http://proxy-dev.abc.com -Dhttps.proxyPort=3128 -Dhttp.proxyHost=http://proxy-dev.abc.com -Dhttp.proxyPort=3128 -jar ./target/cloudwatch_exporter-0.2-SNAPSHOT-jar-with-dependencies.jar 9106 config.yml

Surely I am not the only one using a proxy? How can I fix this?

Thanks
Stef