Light

mapbox / elb-metrics Goto Github PK

View Code? Open in Web Editor NEW

1.0 123.0 3.0 50 KB

Node module which displays ELB metrics

License: ISC License

JavaScript 100.00%

api-infrastructure logging-and-metrics

elb-metrics's Introduction

elb-metrics

Node module which collects metrics from the ElasticLoadBalacer and makes assertions about the performance of the stack, by calculating the number of requests that were 2xxs, 3xxs, 4xxs, 5xxs and the time taken to get back a response (latency).

Install

git clone [email protected]:mapbox/elb-metrics.git
cd elb-metrics
npm install 
npm link

You can use elb-metrics as a command line tool:

usage: elb-metrics --startTime --endTime --region --elbname

Note: The options you provide for the startTime and endTime should be unix timestamps. The region refers to the region of ElasticLoadBalancer you require the details for. The elbname refers to the name of the ElasticLoadBalancer which you can get by searching the AWS console.

Tests

After cloning and installing elb-metrics you can run tests by running:

npm test

elb-metrics's People

Contributors

Stargazers

Watchers

Forkers

testbigorg rubythonode isabella232

elb-metrics's Issues

set up CI

Let's get this running on CI once tests are merged, so we don't break anything later on!

command line tool

add function to bin so that we can run it like -

elb-metrics --starttime -endtime

input / output

Let's keep it simple in terms of what this tool provides:

Input: the bare minimum params you need to retrieve ELB metrics.
Output: an object that provides a high-level overview of results.

Current bench provides a high level overview of the whole time period, like

### Bench (600s)

- requests:     225.2/s
- connect mean: 142.0ms
- finish mean:  142.1ms
- payload mean: 3315.2
- non 200/304:  1602 (1.2%)
- cacheHit:     0 (0.0%)
- cacheRefresh: 0 (0.0%)
- cacheMiss:    0 (0.0%)
- cacheUnknown: 135119 (100.0%)

I would like to see us do the same. How about we return

Time period (number of seconds)
Average requests/second over the period
Average latency over the period
Percentage of total requests that were 2xx
Percentage of total requests that were 3xx
Percentage of total requests that were 4xx
Percentage of total requests that were 5xx

Here's an example of how I want to interact with this utility:

test('give me metrics', function(assert) {
  var params = {
    start: 1471356157000,
    end: 1471356167000,
    elbName: 'my-elb',
    region: 'us-east-1'
  };

  elbMetrics(params, function(err, res) {
    assert.ifError(err);
    assert.deepEqual(res, {
      'period': '10s',
      'requests': '500/s',
      'mean latency': '100ms',
      '2xx': '67%',
      '3xx': '20%',
      '4xx': '10%',
      '5xx': '3%'
    });
    assert.end();
  });
});

The elbMetrics function will be doing more work behind the scenes, and should be comprised of other functions that we test individually:

take user parameters and combine them with other assumed parameters that we need to make a request to CloudWatch
make the request to CloudWatch
parse the results and do some math to find the averages/values we are interested in
prepare the final object based on those values ^ and return it

Even though there's a lot of different work to do, the idea is to expose a single function, elbMetrics, to the user.

@aarthykc do you feel 👍 about tackling each of those functions (and tests) separately and then putting them all together in a function like the test case above describes ?

Engineering Standards Adherence

Required Elements

If any elements in the below list are not checked, this repo will fail standards compliance.

Not running node 4 or below
Has at least some test coverage?
Has a README?

Rubric

Total possible: 24 points (+2 bonus)
Grading scale:

Point Total	Qualitative Description	Scaled Grade
20+ points	Strongly adheres to eng. standards	5
16-19 points	Adheres to eng. standards fairly well	4
12-15 points	Adheres to some eng. standards	3
8-11 points	Starting to adhere to some eng. standards	2
4-7 points	Following a limited number of eng. standard practices	1
< 4 points	Needs significant work, does not follow most standards	0

Repo grade: 15/24. Grade 3 (2018-08-08)

cc/ @mapbox/sreious-business

Prepare human-readable results

From #5, our goal for this project is some human-readable output like:

{
  'period': '10s',
  'requests': '500/s',
  'mean latency': '100ms',
  '2xx': '67%',
  '3xx': '20%',
  '4xx': '10%',
  '5xx': '3%'
}

Doing the math

period
- Calculate this with the difference between startTime and endTime - should be in seconds, for now
requests
- Add up all datapoints from RequestCount metric, and divide by period (ie number of seconds), to get number of requests per second
mean latency
- Average of all datapoints from Latency metric
2xx
- Calculate total requests by adding all datapoints from RequestCount metric. Calculate total 2xx requests by adding all datapoints from HTTPCode_Backend_2XX metric. 2xx percentage will be total 2xx requests / total requests
3xx
- Rinse and repeat process from 2xx
4xx
- Rinse and repeat process from 2xx
5xx
- Rinse and repeat process from 2xx

Scope of work

✅ New (synchronous) function prepareResults

takes, as input, raw metrics (ie what comes back from your outputMetrics function). also needs, as input, startTimeand endTime
output should be an object that looks something like what is below, making sure that units are also included

{
  'period': '10s',
  'requests': '500/s',
  'mean latency': '100ms',
  '2xx': '67%',
  '3xx': '20%',
  '4xx': '10%',
  '5xx': '3%'
}

✅ Tests for prepareResults

test that all the math makes sense on a small scale (using small fixtures so you can do the math in your head to confirm it makes sense)
test that it handles edge cases gracefully -- e.g. what if there are no 5xx requests?
test any error cases we need to handle here

✅ Integrate into elbMetrics function

should be able to feed this output into your new prepareResults function, and return that response to the end user

Public scoped package

I think if this was scoped to the @mapbox namespace and published with public access, by default everyone at Mapbox would have access to publish to the package per https://docs.npmjs.com/cli/access. 🙂

Ref: #24

cc/@aarthykc

return the first data point with statistics

Return the first data point with the correct statistic in the form of a table

Bench errors out when workerTimeout > 60m

Bench errors out, when the duration of a task is > 60 mins and doesn't return any valid inputs:

{
 "messageId": "xxx",
 "error": "start and end time should not be more than 60 minutes apart"
}

https://github.com/mapbox/elb-metrics/blob/master/index.js#L38

Would it be okay to increase this limit or remove it?

cc/ @aarthykc

Compatibility with ELB2

ELB 2 / Application Load Balancing is a different namespace, and some of the metrics are named differently (ie "Latency" becomes "TargetResponseTime"). It would be great to be compatible with both versions of ELB. Can this be detected automatically, or passed as a flag, and the right metrics returned whether you're using classic ELB or ELB 2?

cc @aarthykc @yhahn

Add us-east-2

allow arguments

Allow it to take arguments like - --starttime --endtime

cc @emilymcafee, @yhahn

[getMetrics] prepare all of the queries !

We're close with getMetrics, but we need to make sure it gathers everything we're looking for. I'm seeing a need for a synchronous function that prepares all of your CloudWatch queries. It could be called something like prepareQueries or prepareParams.

Input: startTime, endTime, elbname
Output: an array of objects that could each be passed as params to cloudwatch.getMetricStatistics. (eg each will be an object like this)
The metrics to collect, per #5 and http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/elb-metricscollected.html#loadbalancing-metrics-clb:

metric name	statistic to look for
HTTPCode_Backend_2XX	count
HTTPCode_Backend_3XX	count
HTTPCode_Backend_4XX	count
HTTPCode_Backend_5XX	count
RequestCount	count
Latency	average

@aarthykc let's add this to your branch in #1 and close this issue when you have the following:

new function in index.js that takes startTime, endTime, elbname and returns all of the params objects you will need (in an array)
new tests in index.test.js that:
- tests successful cases
- tests error cases (e.g. if some input is missing, or if the input is the incorrect type) -- remember this is a synchronous function, so we should throw ^_^

Once we have ^ we can close here and worry about integration back in #1.

[getMetrics] response format

The final act of getMetrics will be to return all datapoints in a sensible way, that way they can be consumed for analysis later on.

@aarthykc I'm thinking the final response from getMetrics should be formatted like this:

{
    'request count': [ /* all of your datapoints here */ ],
    'latency': [ /* datapoints */ ],
    '2xx': [ /* datapoints */ ],
    '3xx': [ /* datapoints */ ],
    '4xx': [ /* datapoints */ ],
    '5xx': [ /* datapoints */ ]
}

Just opening this for visibility, you can implement in #1 and close here when done?

Documentation

The README.md should have some basic documentation on how to use this tool.

Description of what it does
How to install
Usage

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.