Git Product home page Git Product logo

newrelic / nr1-slo-r Goto Github PK

View Code? Open in Web Editor NEW
21.0 28.0 21.0 7.64 MB

NR1 SLO-R allows you to define, calculate and report on service-level objective (SLO) attainment.

Home Page: https://discuss.newrelic.com/t/track-your-service-level-objectives-with-the-slo-r-nerdpack/90046

License: Apache License 2.0

JavaScript 83.55% SCSS 16.45%
newrelic nerdpack nr1 nr1-slo-r slo error-slos alert-slos

nr1-slo-r's Introduction

New Relic One Catalog Project header

SLO/R

CI GitHub release (latest SemVer including pre-releases) Snyk

Announcement

New Relic Service Level Management is now in Beta! To find out more please take a look at the docs.

Service Level Management introduces the ability to define and analyse Service Levels with a scalable and centralized user experience.

For users of SLO/R you can migrate the existing SLOs you have defined to this new format. Just follow the instructions in our migration companion.

What does this mean for the future of SLO/R?

  • We will be retiring SLO/R from the New Relic Apps Catalog. We highly recommend any new users take advantage of the in-product SLM experience as it is far superior to the SLO/R open source project.
  • We will update this repo to legacy status, and keep the code available as an example of working with SLOs.
  • For active users of SLO/R we will be reaching out to ensure your transition to the in-product experience is as easy as possible.

Usage

SLO/R lets you quickly define SLOs for error, availability, capacity, and latency conditions.

You can use the application for reporting out your results. By measuring SLO attainment across your service estate, you’ll be able to determine what signals are most important.

Using New Relic as a consistent basis to define and measure your SLOs offers better insight into comparative SLO attainment in your service delivery organization.

SLO/R provides two mechanisms for calculating SLOs: event based - availability/latency (calculated by defects or a specified latency on transactions) and custom (alert) based which includes availability, capacity, and latency types (calculated by total duration of alert violation).

We are keen to see SLO/R evolve and grow to include additional features and visualizations. For version 1.0.1, we wanted to ship the core SLO calculation capabilities. We expect to rapidly build upon this core functionality through several releases. Please add an issue to the repo is there's a feature you'd like to see. For more details about the SLOs and their calculations, please see error driven SLOs and alert driven SLOs.

Open source license

This project is distributed under the Apache 2 license.

Dependencies

Requires New Relic APM.

SLO/R is intended to work specifically with services reporting to New Relic via an APM Agent. The service provides an entity upon which to define SLOs.

Getting started

  1. First, ensure that you have Git and NPM installed. If you're unsure whether you have one or both of them installed, run the following command(s) (If you have them installed these commands will return a version number, if not, the commands won't be recognized):

    git --version
    npm -v
  2. Next, install the New Relic One CLI by going to this link and following the instructions (5 minutes or less) to install and set up your New Relic development environment.

  3. Next, to clone this repository and run the code locally against your New Relic data, execute the following command:

    nr1 nerdpack:clone -r https://github.com/newrelic/nr1-slo-r.git
    cd nr1-slo-r
    nr1 nerdpack:serve

Visit https://one.newrelic.com/?nerdpacks=local, navigate to the Nerdpack, and ✨

Deploying this Nerdpack

Open a command prompt in the nerdpack's directory and run the following commands.

# To create a new uuid for the nerdpack so that you can deploy it to your account:
nr1 nerdpack:uuid -g [--profile=your_profile_name]

# To see a list of APIkeys / profiles available in your development environment:
# nr1 profiles:list
nr1 nerdpack:publish [--profile=your_profile_name]
nr1 nerdpack:deploy [-c [DEV|BETA|STABLE]] [--profile=your_profile_name]
nr1 nerdpack:subscribe [-c [DEV|BETA|STABLE]] [--profile=your_profile_name]

Visit https://one.newrelic.com, navigate to the Nerdpack, and ✨

Configuring SLO/R Alert Webhook

The custom events - availability, capacity, and latency SLO types within SLO/R are calculated using the total duration of alert violations. In order to record those alert violations we need to enable an Insights directed Webhook to capture the open and close events.

The alert payload needs to be as specified for SLO/R to operate as expected. Please follow these instructions to enable the alert event forwarding.

For more information on sending alert data to New Relic, see Sending Alerts data to New Relic.

How to configure and use SLO/R

Configuration in Entity Explorer

SLO definitions are scoped and stored with service entities. Open a service entity by exploring your services in the Entity explorer from the New Relic One homepage.

Screenshot #6

Select the service you are interested in creating SLOs for. In our example we will be using the Origami Portal Service. Screenshot #7

Select the SLO/R New Relic One app from the left-hand navigation in your entity. Screenshot #16

If you (or others) haven't configured an SLO the canvas will be empty. Just click on the Define an SLO button to begin configuring your first SLO. Screenshot #1

The UI will open a side-panel to facilitate configuration. Fill in the fields:

  • SLO Name: Give your SLO a name, this has to be unique for the service or will overwrite similarly named SLOs for this entity.
  • Description: Give a quick overview of what you're basing this SLO on.
  • SLO Group: This is grouping meta-data. Typically organizations are responsible for multiple services and SLOs. This gives us an ability to roll up the SLO to an organizational attainment.
  • Target attainment: The numeric value as a percentage, you wish as your SLO target (e.g. 99.995)
  • Indicator: There are four indicators for SLOs in SLO/R - Error, Availability, Capacity, and Latency. Error SLOs are calculated from Transaction event defects. Availability, latency, and capacity SLOs are calculated by alert violations.

Example error SLO Screenshot #3

For Error SLOs you need to define the defects you wish to measure and the transaction names you want to associate with this SLO.

Example Availability SLO Screenshot #2

Alert driven SLOs depend on alert events being reported in the SLOR_ALERTS table. Please see SLO/R alerts config to ensure you're set up to capture alert events.

Once you've created a few SLOs you should see a view like the following:

Screenshot #4

Configuration in Launcher app

Other way of configuring SLO is through Launcher app. Difference between creating SLOs from Entity Explorer and from Launcher is that entity must be selected first.

Screenshot #27

Using app from Launcher

It is possible to combine multiple SLOs into tables and user selection is stored in NRDB.

Screenshot #26

SLOs can be filtered by tags attached to them:

Screenshot #28

How is SLO/R arriving at the SLO calculations?

For details, see Alert SLOs and Error SLOs.

Community Support

New Relic hosts and moderates an online forum where you can interact with New Relic employees as well as other customers to get help and share best practices. Like all New Relic open source community projects, there's a related topic in the New Relic Explorers Hub. You can find this project's topic/threads here:

https://discuss.newrelic.com/t/track-your-service-level-objectives-with-the-slo-r-nerdpack/90046

Please do not report issues with SLO/R to New Relic Global Technical Support. Instead, visit the Explorers Hub for troubleshooting and best-practices.

Issues and enhancement requests

Issues and enhancement requests can be submitted in the Issues tab of this repository. Please search for and review the existing open issues before submitting a new issue.

Security

As noted in our security policy, New Relic is committed to the privacy and security of our customers and their data. We believe that providing coordinated disclosure by security researchers and engaging with the security community are important means to achieve our security goals.

If you believe you have found a security vulnerability in this project or any of New Relic's products or websites, we welcome and greatly appreciate you reporting it to New Relic through HackerOne.

Contributing

Contributions are welcome (and if you submit an enhancement request, expect to be invited to contribute it yourself 😁). Please review our contributors guide.

Keep in mind that when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. If you'd like to execute our corporate CLA, or if you have any questions, please drop us an email at [email protected].

nr1-slo-r's People

Contributors

artuone83 avatar csandels avatar danielgolden avatar devfreddy avatar idirouhab avatar jbeveland27 avatar khpeet avatar michellosier avatar norbertsuski avatar nr-opensource-bot avatar prototypicalpro avatar reptorz avatar ricegi avatar rudouglas avatar seankcarpenter1 avatar semantic-release-bot avatar shahramk avatar snyk-bot avatar tangollama avatar zuluecho9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nr1-slo-r's Issues

Configure for circle ci

Prerequisites:

  • Github Personal Access token with "public_repo" scope
  • Snyk API Token
  • CircleCI Personal API Token

Required setup items:

  • Add project to CircleCI

  • Configure advanced CircleCI settings - https://circleci.com/gh/newrelic/nr1-customer-journey/edit#advanced-settings

    • Only build pull requests
    • Build forked pull requests
  • Add branch protection for the master branch (even from administrators)

  • Add a Deploy Key (w/write access) to github repo (public key) & CircleCI (private key)

  • Add Github Personal access token as GITHUB_TOKEN ENV variable. Required by: Semantic-Release Github Plugin

  • Add Snyk token as SNYK_TOKEN ENV variable in CircleCI

  • Add an appropriate NR api key for use in deployment with the nr1 cli as NR1CLI_PROFILE_DEMOTRONV2 ENV variable in CircleCI

  • Copy .circleci directory

  • In .circleci/config.yml update GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL to match project repo

  • In .circleci/config.yml update fingerprint to Deploy key

  • Import bot user's into CLA assistant

Sample CSV import file:

user,email,agreement
circleci[bot],[email protected],TRUE
@semantic-release-bot,[email protected],TRUE

Alert Condition based SLOs

Summary

Currently you can only assign an alert driven SLO by policy. However that would mean you need a 1:1:1 on entity - condition - policy for these alerts to be accurate. Example - entity: app1, condition: throughput high, policy: app1 high throughput

Desired Behaviour

It would be great if we could select by condition under the policy.
Example - Policy: Backend, conditions: high CPU, high response time, low apdex

If you have policies grouped by app or function there needs to be a way to specify which condition we would like to pull out of the policy.

Possible Solution

Adding one more layer to the form to select the condition under the policy if needed. The condition name is being captured in the JSON payload from alerts already.

Allow editing of SLO Result to indicate backout windows or rejected defects

Provide a mechanism to take a given SLO result and post-edit it to annotate defects or alert periods that were part of expected blackout or items that should not apply to the SLO calculation.

In sure these items are well documented and the revised SLO calculations appear with suitable annotation.

Launcher - Summary View

For each indicator (have a section per indicator):

  • render a summary "row" summarizing each indicator (in-memory using row-level data)
  • render a table or list of each SLO
// Some psuedo-code

render () {
  return SLO_INDICATORS.map((indicator) => {
    return <>
      <IndicatorSummary></IndicatorSummary>
      <IndicatorTable></IndicatorTable>
    </>
  )
}

Calculation Blackout Periods

The ability to specify a blackout period for an SLO definition so that known downtimes will be excluded from the calculations of SLO attainment.

Summary

Allows us to make better SLO designs that represent some of the variable aspects of time based SLO calculation.

Desired Behaviour

there should be a policy dialog with the SLO configuration - ability to specify a recurring policy or a one-off period of time. These should persist with the policy. Probably an array of them or something like that.

Possible Solution

as above dialog a "blackouts" section of the slo.json

Additional context

Just want to make the most useful configurator evah!

Provide alerting on SLO budget consumption

Per feedback -

The amount of budget we have left is what we need to monitor and also to alert on the rate of which that budget is consumed. We need to know if something is about to fall over outside of the norm.

Style the "view details" modal

Captura de Pantalla 2019-12-09 a la(s) 10 51 04 a  m

It needs:

  • a heading
  • A section for NRQL that is either an accordion or tabs
  • That json output should be some pretty UI showing the definition of the document

Error indicator not using correct field

When setting up an Error indicator SLO, it is filtering only on the httpResponseCode field and not including the response.status one. For at least .NET Framework agents, the httpResponseCode field is not populated but the response.status one is.

NRQL query for alert based SLO not correct

Description

After defining SLO it doesn't work. When viewing the details of the SLO the NRQL has WHERE policy_name IN (’’) which obviously won't work.

Steps to Reproduce

Define a SLO

Expected Behaviour

Should generate the correct NRQL.

Relevant Logs / Console output

Your Environment

  • NR1 CLI version used: @datanerd/nr1/1.2.2 win32-x64 node-v10.16.3
  • Browser name and version: Chrome
  • Operating System and version: Windows 10

Additional context

image

Style the new SLO table

We're using a new, more capable, table component. However, since the switch the styling of the table has regressed. Fix that.

SLO/R Overview shows no SLOs defined, even though several are created

SLO/R Overview shows no SLOs defined, even though several are created

Description

When I launch the SLO/R Overview page, it shows now SLOs defined. The SLO Group dropdowns are empty, even though several SLOs and Groups have been created in NR1.

Steps to Reproduce

1 - Go to a service and define a new Errors SLO
2 - Name the group while defining the SLO
3 - Go the main NR1 page, and click on the SLO/R Launcher
4 - No SLOs are showing

Expected Behaviour

Defined SLOs should show in the SLO/R Overview page.

  • NR1 CLI version used: 1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version: Chrome
  • Operating System and version: MacOS

Additional Attachments

image

image

Use time picker to determine the limit for config transaction loads

The current config defaults to look at the top 100 transactions from the last Month. For accounts with billions of events this can pretty easily time out. So we should tie the transaction and alerts selection lists to those events discovered during the time picker window.

Review terminology

Per feedback

The easiest feedback we got was around how we named items, eg. the SLI's are the latency, throughput, uptime and error. The objective would then be defined (the SLO) as the target, where we named the SLI's as "Type", he said that was confusing, it should be the Indicator.

Also the term "error budget" as an SLO is not correct in that dropdown. It should be "errors" as that is the indicator.

The way he explained it was that if you have 100 transactions a day, the target would be that I want 90 of those 100 transactions to be error free. And the other 10 is the "budget" which is your SLO.

Two SLO's based upon Error indicator in the same SLO group overwrite the defects selected in the dropdown

Description

When creating two SLO's in the same SLO group, if both use the Error indicator then changing the defects selected in the dropdown (5xx errors, 401-unauthorised etc) are shared between the two SLO's. For example if you wanted one SLO for 5xx errors and another for everything else, this is impossible because when you change one, the other is changed too.

Steps to Reproduce

Have two Error SLO's in a single SLO group. They have different transactions selected. Set one to use 5XX defects, the other to use 401 - Unauthorised. Then change one Error SLO and add a new defect. You should observe they now BOTH have this defect selected.

Expected Behaviour

I'd expect to be able to have two (or more) separate Error SLO's in an SLO group each scoped to different transactions and different defects (5xx, 401, 403 etc).

Relevant Logs / Console output

None unfortunately.

Your Environment

  • NR1 CLI version used:
    @datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version:
    Chrome 79 64 bit. Also observed on customer machine (versions unknown)
  • Operating System and version:
    Mac OS 10.14.3

Additional context

None.

SLO/R Entity Nerdlet not refreshing on Entity select

When looking at SLOs in the SLO/R entity nerdlet and you select a new entity in the breadcrumbs selected the view does not update

Description

see above

Steps to Reproduce

see above

Expected Behaviour

The context should switch and you should see the SLOs associated with the new entity you've selected.

Relevant Logs / Console output

N/A

Your Environment

N/A

Additional context

None

Create Calendar View for each SLO

Provide a Weekly / Monthly view for each SLO attainment calculation.

Summary

Provides a view of SLO attainment in the conventional calendar sense rather than the rolling current, 7 day, and 30 day window.

Desired Behaviour

Possible toggle between calendar view and the current rolling SLO calculations.

Possible Solution

...

Additional context

Most people will want to ahve a calendar view fo SLO attainment.

SLO as Code

Provide documentation and tooling for defining SLO definition via GraphQL mutation and integration into CICD pipeline.

Open this NRQL in chart builder

Allow the NRQL behind an SLO definition to be automatically examined in Chart Builder.

See Graphiql Notebook or Datalyzer for a crib.

Auto Assign SLOs

Ability to scan services and auto assign SLOs for latency, availability, throughput and error budget.

Using historical data, take a running average to define the "99.5 percentile" and auto assign those numbers as targets. In this way, large organizations can quickly ramp up to speed and only adjust as necessary, rather than manually set up each process.

Error when editing an Error-based SLO

Description

When editing an existing Error SLO, I click on the three dots, then Edit. Add a new defect to an Error SLO and click Update Service, intermittently, it doesn't update. So I click back on Edit on the same SLO and the defect I added is not there.

Steps to Reproduce

See description above.

Expected Behaviour

When I add a new defect type to an existing Error SLO using the Edit option and click Update Service, any edits I make are persisted.

Relevant Logs / Console output

When I click on the Update Service button, I observed this error in the console

single-document.js:25 Uncaught (in promise) TypeError: e.map is not a function
    at c (single-document.js:25)
    at s (single-document.js:33)
    at single-document.js:65
    at c (runtime.js:45)
    at Generator._invoke (runtime.js:271)
    at Generator.T.forEach.e.<computed> [as next] (runtime.js:97)
    at c (runtime.js:45)
    at t (runtime.js:135)
    at runtime.js:170
    at new Promise (<anonymous>)

Your Environment

  • NR1 CLI version used:
    @datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version:
    Chrome 79
  • Operating System and version:
    Mac OS 10.14.3

Additional context

SLO Group had multiple Error SLO's defined within it.

When creating new SLO's, make SLO Group a drop down with options of existing groups

Summary

It's hard to remember/know which groups you've already created. To aid the user experience and prevent lots of duplicated SLO groups being created, you should be able to pick an SLO group from the dropdown when defining a new SLO

Desired Behaviour

When defining a new SLO, the field for SLO Group should be a dropdown (if groups exist) or if not, allow user to create the first new group.

Possible Solution

Store groups in nerdstorage so the component for defining a new SLO can check to see if groups already exist, if so, display them in a dropdown.

Additional context

It's a poor user experience where you have to remember the groups already existing and spell the group exactly right for the SLO to end up in the right group.

Alert defined for SLOs (Budget Perspective)

Summary

As SLOs are defined we should think about them in terms of their overall budget. If the attainment objective is 99.98 ... alerting on the rate of budget consumption versus the total amount of time remaining in the time period.

e.g. - Error SLO of 99.5 ... halfway through the measurement period we are at 99.6 attainment - meaning based on a straight line rate calculation for the SLO we are not going to make our time-bound objective.

Desired Behaviour

Alerts defined for SLOs that are sophisticated enough to execute the rate based budget consumption alerting for an SLO

Possible Solution

TBD

Additional context

Use time window and rate of consumption for the alerting context ...

  • 7 day SLO alert
  • 30 day SLO alert
  • Specific Month SLO alert

Need way to define a new SLO from the Launcher

Summary

The definition of an SLO linked to a simple entity is too opaque - we need to make it easier to get to the SLO definition. In the case of Alert derived SLOs there is a really loose correlation between the entity and the Alert. So what's the point of limiting the definition of SLOs at the entity.

Desired Behaviour

It is easy to define everything you need for an SLO in one place.

Possible Solution

TBD - modification of the SLO definition experience

Additional context

Use entity meta-data (tags / labels) to provide the overview orchestration

Summary

On shipping SLO/R relies on a construct of an SLO Group (nee organization nee team) to group multiple SLOs into one attainment. This is adding an artificial construct that won't age well in New Relic - so it would be better to use the meta-data that is already available with the entities as the basis for grouping. NR users can then just worry about organizing their entities with proper taxonomy instead of having to re-do it in SLO/R.

Desired Behaviour

Select from a list of available metadata or being typing the metadata and you get an overview report for all the SLOs on all the entities that contains that metadata. We could allow for multiple tags narrowing the context.

Possible Solution

update the composite / organization query logic to take an array of applicable entities based on the various tags chosen.

Additional context

I think this would dramatically improve the flexibility to report overviews for SLOs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.