Git Product home page Git Product logo

newrelic / nr1-slo-r Goto Github PK

View Code? Open in Web Editor NEW
21.0 28.0 21.0 7.64 MB

NR1 SLO-R allows you to define, calculate and report on service-level objective (SLO) attainment.

Home Page: https://discuss.newrelic.com/t/track-your-service-level-objectives-with-the-slo-r-nerdpack/90046

License: Apache License 2.0

JavaScript 83.55% SCSS 16.45%
newrelic nerdpack nr1 nr1-slo-r slo error-slos alert-slos

nr1-slo-r's Issues

Use time picker to determine the limit for config transaction loads

The current config defaults to look at the top 100 transactions from the last Month. For accounts with billions of events this can pretty easily time out. So we should tie the transaction and alerts selection lists to those events discovered during the time picker window.

Error when editing an Error-based SLO

Description

When editing an existing Error SLO, I click on the three dots, then Edit. Add a new defect to an Error SLO and click Update Service, intermittently, it doesn't update. So I click back on Edit on the same SLO and the defect I added is not there.

Steps to Reproduce

See description above.

Expected Behaviour

When I add a new defect type to an existing Error SLO using the Edit option and click Update Service, any edits I make are persisted.

Relevant Logs / Console output

When I click on the Update Service button, I observed this error in the console

single-document.js:25 Uncaught (in promise) TypeError: e.map is not a function
    at c (single-document.js:25)
    at s (single-document.js:33)
    at single-document.js:65
    at c (runtime.js:45)
    at Generator._invoke (runtime.js:271)
    at Generator.T.forEach.e.<computed> [as next] (runtime.js:97)
    at c (runtime.js:45)
    at t (runtime.js:135)
    at runtime.js:170
    at new Promise (<anonymous>)

Your Environment

  • NR1 CLI version used:
    @datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version:
    Chrome 79
  • Operating System and version:
    Mac OS 10.14.3

Additional context

SLO Group had multiple Error SLO's defined within it.

SLO/R Overview shows no SLOs defined, even though several are created

SLO/R Overview shows no SLOs defined, even though several are created

Description

When I launch the SLO/R Overview page, it shows now SLOs defined. The SLO Group dropdowns are empty, even though several SLOs and Groups have been created in NR1.

Steps to Reproduce

1 - Go to a service and define a new Errors SLO
2 - Name the group while defining the SLO
3 - Go the main NR1 page, and click on the SLO/R Launcher
4 - No SLOs are showing

Expected Behaviour

Defined SLOs should show in the SLO/R Overview page.

  • NR1 CLI version used: 1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version: Chrome
  • Operating System and version: MacOS

Additional Attachments

image

image

Create Calendar View for each SLO

Provide a Weekly / Monthly view for each SLO attainment calculation.

Summary

Provides a view of SLO attainment in the conventional calendar sense rather than the rolling current, 7 day, and 30 day window.

Desired Behaviour

Possible toggle between calendar view and the current rolling SLO calculations.

Possible Solution

...

Additional context

Most people will want to ahve a calendar view fo SLO attainment.

Provide alerting on SLO budget consumption

Per feedback -

The amount of budget we have left is what we need to monitor and also to alert on the rate of which that budget is consumed. We need to know if something is about to fall over outside of the norm.

SLO as Code

Provide documentation and tooling for defining SLO definition via GraphQL mutation and integration into CICD pipeline.

Error indicator not using correct field

When setting up an Error indicator SLO, it is filtering only on the httpResponseCode field and not including the response.status one. For at least .NET Framework agents, the httpResponseCode field is not populated but the response.status one is.

Configure for circle ci

Prerequisites:

  • Github Personal Access token with "public_repo" scope
  • Snyk API Token
  • CircleCI Personal API Token

Required setup items:

  • Add project to CircleCI

  • Configure advanced CircleCI settings - https://circleci.com/gh/newrelic/nr1-customer-journey/edit#advanced-settings

    • Only build pull requests
    • Build forked pull requests
  • Add branch protection for the master branch (even from administrators)

  • Add a Deploy Key (w/write access) to github repo (public key) & CircleCI (private key)

  • Add Github Personal access token as GITHUB_TOKEN ENV variable. Required by: Semantic-Release Github Plugin

  • Add Snyk token as SNYK_TOKEN ENV variable in CircleCI

  • Add an appropriate NR api key for use in deployment with the nr1 cli as NR1CLI_PROFILE_DEMOTRONV2 ENV variable in CircleCI

  • Copy .circleci directory

  • In .circleci/config.yml update GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL to match project repo

  • In .circleci/config.yml update fingerprint to Deploy key

  • Import bot user's into CLA assistant

Sample CSV import file:

user,email,agreement
circleci[bot],[email protected],TRUE
@semantic-release-bot,[email protected],TRUE

Allow editing of SLO Result to indicate backout windows or rejected defects

Provide a mechanism to take a given SLO result and post-edit it to annotate defects or alert periods that were part of expected blackout or items that should not apply to the SLO calculation.

In sure these items are well documented and the revised SLO calculations appear with suitable annotation.

Auto Assign SLOs

Ability to scan services and auto assign SLOs for latency, availability, throughput and error budget.

Using historical data, take a running average to define the "99.5 percentile" and auto assign those numbers as targets. In this way, large organizations can quickly ramp up to speed and only adjust as necessary, rather than manually set up each process.

Style the new SLO table

We're using a new, more capable, table component. However, since the switch the styling of the table has regressed. Fix that.

Two SLO's based upon Error indicator in the same SLO group overwrite the defects selected in the dropdown

Description

When creating two SLO's in the same SLO group, if both use the Error indicator then changing the defects selected in the dropdown (5xx errors, 401-unauthorised etc) are shared between the two SLO's. For example if you wanted one SLO for 5xx errors and another for everything else, this is impossible because when you change one, the other is changed too.

Steps to Reproduce

Have two Error SLO's in a single SLO group. They have different transactions selected. Set one to use 5XX defects, the other to use 401 - Unauthorised. Then change one Error SLO and add a new defect. You should observe they now BOTH have this defect selected.

Expected Behaviour

I'd expect to be able to have two (or more) separate Error SLO's in an SLO group each scoped to different transactions and different defects (5xx, 401, 403 etc).

Relevant Logs / Console output

None unfortunately.

Your Environment

  • NR1 CLI version used:
    @datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
  • Browser name and version:
    Chrome 79 64 bit. Also observed on customer machine (versions unknown)
  • Operating System and version:
    Mac OS 10.14.3

Additional context

None.

When creating new SLO's, make SLO Group a drop down with options of existing groups

Summary

It's hard to remember/know which groups you've already created. To aid the user experience and prevent lots of duplicated SLO groups being created, you should be able to pick an SLO group from the dropdown when defining a new SLO

Desired Behaviour

When defining a new SLO, the field for SLO Group should be a dropdown (if groups exist) or if not, allow user to create the first new group.

Possible Solution

Store groups in nerdstorage so the component for defining a new SLO can check to see if groups already exist, if so, display them in a dropdown.

Additional context

It's a poor user experience where you have to remember the groups already existing and spell the group exactly right for the SLO to end up in the right group.

Alert Condition based SLOs

Summary

Currently you can only assign an alert driven SLO by policy. However that would mean you need a 1:1:1 on entity - condition - policy for these alerts to be accurate. Example - entity: app1, condition: throughput high, policy: app1 high throughput

Desired Behaviour

It would be great if we could select by condition under the policy.
Example - Policy: Backend, conditions: high CPU, high response time, low apdex

If you have policies grouped by app or function there needs to be a way to specify which condition we would like to pull out of the policy.

Possible Solution

Adding one more layer to the form to select the condition under the policy if needed. The condition name is being captured in the JSON payload from alerts already.

Use entity meta-data (tags / labels) to provide the overview orchestration

Summary

On shipping SLO/R relies on a construct of an SLO Group (nee organization nee team) to group multiple SLOs into one attainment. This is adding an artificial construct that won't age well in New Relic - so it would be better to use the meta-data that is already available with the entities as the basis for grouping. NR users can then just worry about organizing their entities with proper taxonomy instead of having to re-do it in SLO/R.

Desired Behaviour

Select from a list of available metadata or being typing the metadata and you get an overview report for all the SLOs on all the entities that contains that metadata. We could allow for multiple tags narrowing the context.

Possible Solution

update the composite / organization query logic to take an array of applicable entities based on the various tags chosen.

Additional context

I think this would dramatically improve the flexibility to report overviews for SLOs.

Calculation Blackout Periods

The ability to specify a blackout period for an SLO definition so that known downtimes will be excluded from the calculations of SLO attainment.

Summary

Allows us to make better SLO designs that represent some of the variable aspects of time based SLO calculation.

Desired Behaviour

there should be a policy dialog with the SLO configuration - ability to specify a recurring policy or a one-off period of time. These should persist with the policy. Probably an array of them or something like that.

Possible Solution

as above dialog a "blackouts" section of the slo.json

Additional context

Just want to make the most useful configurator evah!

Need way to define a new SLO from the Launcher

Summary

The definition of an SLO linked to a simple entity is too opaque - we need to make it easier to get to the SLO definition. In the case of Alert derived SLOs there is a really loose correlation between the entity and the Alert. So what's the point of limiting the definition of SLOs at the entity.

Desired Behaviour

It is easy to define everything you need for an SLO in one place.

Possible Solution

TBD - modification of the SLO definition experience

Additional context

NRQL query for alert based SLO not correct

Description

After defining SLO it doesn't work. When viewing the details of the SLO the NRQL has WHERE policy_name IN (’’) which obviously won't work.

Steps to Reproduce

Define a SLO

Expected Behaviour

Should generate the correct NRQL.

Relevant Logs / Console output

Your Environment

  • NR1 CLI version used: @datanerd/nr1/1.2.2 win32-x64 node-v10.16.3
  • Browser name and version: Chrome
  • Operating System and version: Windows 10

Additional context

image

Review terminology

Per feedback

The easiest feedback we got was around how we named items, eg. the SLI's are the latency, throughput, uptime and error. The objective would then be defined (the SLO) as the target, where we named the SLI's as "Type", he said that was confusing, it should be the Indicator.

Also the term "error budget" as an SLO is not correct in that dropdown. It should be "errors" as that is the indicator.

The way he explained it was that if you have 100 transactions a day, the target would be that I want 90 of those 100 transactions to be error free. And the other 10 is the "budget" which is your SLO.

Style the "view details" modal

Captura de Pantalla 2019-12-09 a la(s) 10 51 04 a  m

It needs:

  • a heading
  • A section for NRQL that is either an accordion or tabs
  • That json output should be some pretty UI showing the definition of the document

Open this NRQL in chart builder

Allow the NRQL behind an SLO definition to be automatically examined in Chart Builder.

See Graphiql Notebook or Datalyzer for a crib.

Alert defined for SLOs (Budget Perspective)

Summary

As SLOs are defined we should think about them in terms of their overall budget. If the attainment objective is 99.98 ... alerting on the rate of budget consumption versus the total amount of time remaining in the time period.

e.g. - Error SLO of 99.5 ... halfway through the measurement period we are at 99.6 attainment - meaning based on a straight line rate calculation for the SLO we are not going to make our time-bound objective.

Desired Behaviour

Alerts defined for SLOs that are sophisticated enough to execute the rate based budget consumption alerting for an SLO

Possible Solution

TBD

Additional context

Use time window and rate of consumption for the alerting context ...

  • 7 day SLO alert
  • 30 day SLO alert
  • Specific Month SLO alert

SLO/R Entity Nerdlet not refreshing on Entity select

When looking at SLOs in the SLO/R entity nerdlet and you select a new entity in the breadcrumbs selected the view does not update

Description

see above

Steps to Reproduce

see above

Expected Behaviour

The context should switch and you should see the SLOs associated with the new entity you've selected.

Relevant Logs / Console output

N/A

Your Environment

N/A

Additional context

None

Launcher - Summary View

For each indicator (have a section per indicator):

  • render a summary "row" summarizing each indicator (in-memory using row-level data)
  • render a table or list of each SLO
// Some psuedo-code

render () {
  return SLO_INDICATORS.map((indicator) => {
    return <>
      <IndicatorSummary></IndicatorSummary>
      <IndicatorTable></IndicatorTable>
    </>
  )
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.