newrelic / nr1-slo-r Goto Github PK

View Code? Open in Web Editor NEW

21.0 28.0 21.0 7.64 MB

NR1 SLO-R allows you to define, calculate and report on service-level objective (SLO) attainment.

Home Page: https://discuss.newrelic.com/t/track-your-service-level-objectives-with-the-slo-r-nerdpack/90046

License: Apache License 2.0

JavaScript 83.55% SCSS 16.45%

newrelic nerdpack nr1 nr1-slo-r slo error-slos alert-slos

nr1-slo-r's Issues

Use time picker to determine the limit for config transaction loads

The current config defaults to look at the top 100 transactions from the last Month. For accounts with billions of events this can pretty easily time out. So we should tie the transaction and alerts selection lists to those events discovered during the time picker window.

Edit SLO definitions

Amend the Add functionality to support edits.

Add favorites to a listing / card view

Breakdown SLO Compliance Calculation by Transaction or Alert

It would be good to have a summary how each element in the SLO calculation contributes to the overall calculation - either in a details drilldown or some indication.

link to errors.md not working to setup error alert

https://github.com/newrelic/nr1-slo-r/blob/master/error_slos.md

link does not work

Error when editing an Error-based SLO

Description

When editing an existing Error SLO, I click on the three dots, then Edit. Add a new defect to an Error SLO and click Update Service, intermittently, it doesn't update. So I click back on Edit on the same SLO and the defect I added is not there.

Steps to Reproduce

See description above.

Expected Behaviour

When I add a new defect type to an existing Error SLO using the Edit option and click Update Service, any edits I make are persisted.

Relevant Logs / Console output

When I click on the Update Service button, I observed this error in the console

single-document.js:25 Uncaught (in promise) TypeError: e.map is not a function
    at c (single-document.js:25)
    at s (single-document.js:33)
    at single-document.js:65
    at c (runtime.js:45)
    at Generator._invoke (runtime.js:271)
    at Generator.T.forEach.e.<computed> [as next] (runtime.js:97)
    at c (runtime.js:45)
    at t (runtime.js:135)
    at runtime.js:170
    at new Promise (<anonymous>)

Your Environment

NR1 CLI version used:
@datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
Browser name and version:
Chrome 79
Operating System and version:
Mac OS 10.14.3

Additional context

SLO Group had multiple Error SLO's defined within it.

SLO/R Overview shows no SLOs defined, even though several are created

Description

When I launch the SLO/R Overview page, it shows now SLOs defined. The SLO Group dropdowns are empty, even though several SLOs and Groups have been created in NR1.

Steps to Reproduce

1 - Go to a service and define a new Errors SLO
2 - Name the group while defining the SLO
3 - Go the main NR1 page, and click on the SLO/R Launcher
4 - No SLOs are showing

Expected Behaviour

Defined SLOs should show in the SLO/R Overview page.

NR1 CLI version used: 1.10.10 darwin-x64 node-v10.16.3
Browser name and version: Chrome
Operating System and version: MacOS

Additional Attachments

Create Calendar View for each SLO

Provide a Weekly / Monthly view for each SLO attainment calculation.

Summary

Provides a view of SLO attainment in the conventional calendar sense rather than the rolling current, 7 day, and 30 day window.

Desired Behaviour

Possible toggle between calendar view and the current rolling SLO calculations.

Possible Solution

...

Additional context

Most people will want to ahve a calendar view fo SLO attainment.

Provide alerting on SLO budget consumption

Per feedback -

The amount of budget we have left is what we need to monitor and also to alert on the rate of which that budget is consumed. We need to know if something is about to fall over outside of the norm.

defects don't seem to be making it into the SLO definitions

SLO as Code

Provide documentation and tooling for defining SLO definition via GraphQL mutation and integration into CICD pipeline.

View details needs to respect time range in NRQL output

v. 1 SLO/R

Complete milestone 1 https://github.com/newrelic/nr1-csg-slo-r/milestone/1

https://docs.google.com/document/d/1Mu9u1X3o6kcY8kZG4IXUxD51PH7FDh6YiNl8dzHUyU8/edit

Save Each Monthly/Weekly SLO Attainment in NerdStore

As the basis of providing a more comprehensive on-going report - save off monthly calculations in NerdStore and provide a series of reporting options.

Error indicator not using correct field

When setting up an Error indicator SLO, it is filtering only on the httpResponseCode field and not including the response.status one. For at least .NET Framework agents, the httpResponseCode field is not populated but the response.status one is.

Modify the team definition into org language

Configure for circle ci

Prerequisites:

Github Personal Access token with "public_repo" scope
Snyk API Token
CircleCI Personal API Token

Required setup items:

Sample CSV import file:

user,email,agreement
circleci[bot],[email protected],TRUE
@semantic-release-bot,[email protected],TRUE

Allow editing of SLO Result to indicate backout windows or rejected defects

Provide a mechanism to take a given SLO result and post-edit it to annotate defects or alert periods that were part of expected blackout or items that should not apply to the SLO calculation.

In sure these items are well documented and the revised SLO calculations appear with suitable annotation.

Auto Assign SLOs

Ability to scan services and auto assign SLOs for latency, availability, throughput and error budget.

Using historical data, take a running average to define the "99.5 percentile" and auto assign those numbers as targets. In this way, large organizations can quickly ramp up to speed and only adjust as necessary, rather than manually set up each process.

Address alert violation overlaps

Currently, the code doesn't address overlaps in the alert violations for time attainment. We need to fix that.

Style the new SLO table

We're using a new, more capable, table component. However, since the switch the styling of the table has regressed. Fix that.

Provide definitions for SLO calculations

We need to define what the SLO is based on specifically, a period of time, a total capacity or consumption of error budget.

Two SLO's based upon Error indicator in the same SLO group overwrite the defects selected in the dropdown

Description

When creating two SLO's in the same SLO group, if both use the Error indicator then changing the defects selected in the dropdown (5xx errors, 401-unauthorised etc) are shared between the two SLO's. For example if you wanted one SLO for 5xx errors and another for everything else, this is impossible because when you change one, the other is changed too.

Steps to Reproduce

Have two Error SLO's in a single SLO group. They have different transactions selected. Set one to use 5XX defects, the other to use 401 - Unauthorised. Then change one Error SLO and add a new defect. You should observe they now BOTH have this defect selected.

Expected Behaviour

I'd expect to be able to have two (or more) separate Error SLO's in an SLO group each scoped to different transactions and different defects (5xx, 401, 403 etc).

Relevant Logs / Console output

None unfortunately.

Your Environment

NR1 CLI version used:
@datanerd/nr1/1.10.10 darwin-x64 node-v10.16.3
Browser name and version:
Chrome 79 64 bit. Also observed on customer machine (versions unknown)
Operating System and version:
Mac OS 10.14.3

Additional context

None.

Automatically refresh the SLO's every 60 seconds

When creating new SLO's, make SLO Group a drop down with options of existing groups

Summary

It's hard to remember/know which groups you've already created. To aid the user experience and prevent lots of duplicated SLO groups being created, you should be able to pick an SLO group from the dropdown when defining a new SLO

Desired Behaviour

When defining a new SLO, the field for SLO Group should be a dropdown (if groups exist) or if not, allow user to create the first new group.

Possible Solution

Store groups in nerdstorage so the component for defining a new SLO can check to see if groups already exist, if so, display them in a dropdown.

Additional context

It's a poor user experience where you have to remember the groups already existing and spell the group exactly right for the SLO to end up in the right group.

Alert Condition based SLOs

Summary

Currently you can only assign an alert driven SLO by policy. However that would mean you need a 1:1:1 on entity - condition - policy for these alerts to be accurate. Example - entity: app1, condition: throughput high, policy: app1 high throughput

Desired Behaviour

It would be great if we could select by condition under the policy.
Example - Policy: Backend, conditions: high CPU, high response time, low apdex

If you have policies grouped by app or function there needs to be a way to specify which condition we would like to pull out of the policy.

Possible Solution

Adding one more layer to the form to select the condition under the policy if needed. The condition name is being captured in the JSON payload from alerts already.

Implement config screen in a modal

If you don't have any SLO Alert Policies, point the user to docs

Use entity meta-data (tags / labels) to provide the overview orchestration

Summary

On shipping SLO/R relies on a construct of an SLO Group (nee organization nee team) to group multiple SLOs into one attainment. This is adding an artificial construct that won't age well in New Relic - so it would be better to use the meta-data that is already available with the entities as the basis for grouping. NR users can then just worry about organizing their entities with proper taxonomy instead of having to re-do it in SLO/R.

Desired Behaviour

Select from a list of available metadata or being typing the metadata and you get an overview report for all the SLOs on all the entities that contains that metadata. We could allow for multiple tags narrowing the context.

Possible Solution

update the composite / organization query logic to take an array of applicable entities based on the various tags chosen.

Additional context

I think this would dramatically improve the flexibility to report overviews for SLOs.

Calculation Blackout Periods

The ability to specify a blackout period for an SLO definition so that known downtimes will be excluded from the calculations of SLO attainment.

Summary

Allows us to make better SLO designs that represent some of the variable aspects of time based SLO calculation.

Desired Behaviour

there should be a policy dialog with the SLO configuration - ability to specify a recurring policy or a one-off period of time. These should persist with the policy. Probably an array of them or something like that.

Possible Solution

as above dialog a "blackouts" section of the slo.json

Additional context

Just want to make the most useful configurator evah!

Replace the README screenshots

View SLO Definition

Currently, it's not possible to review the SLO definition. It'd be nice to be able to do that.

per @AlecIsaacson

Need way to define a new SLO from the Launcher

Summary

The definition of an SLO linked to a simple entity is too opaque - we need to make it easier to get to the SLO definition. In the case of Alert derived SLOs there is a really loose correlation between the entity and the Alert. So what's the point of limiting the definition of SLOs at the entity.

Desired Behaviour

It is easy to define everything you need for an SLO in one place.

Possible Solution

TBD - modification of the SLO definition experience

Additional context

Select / design icon for Launcher

Replace the table component

NRQL query for alert based SLO not correct

Description

After defining SLO it doesn't work. When viewing the details of the SLO the NRQL has WHERE policy_name IN (’’) which obviously won't work.

Steps to Reproduce

Define a SLO

Expected Behaviour

Should generate the correct NRQL.

Relevant Logs / Console output

Your Environment

NR1 CLI version used: @datanerd/nr1/1.2.2 win32-x64 node-v10.16.3
Browser name and version: Chrome
Operating System and version: Windows 10

Additional context

Review terminology

Per feedback

The easiest feedback we got was around how we named items, eg. the SLI's are the latency, throughput, uptime and error. The objective would then be defined (the SLO) as the target, where we named the SLI's as "Type", he said that was confusing, it should be the Indicator.

Also the term "error budget" as an SLO is not correct in that dropdown. It should be "errors" as that is the indicator.

The way he explained it was that if you have 100 transactions a day, the target would be that I want 90 of those 100 transactions to be error free. And the other 10 is the "budget" which is your SLO.

Color code the table cells based on attainment

Style the "view details" modal

It needs:

a heading
A section for NRQL that is either an accordion or tabs
That json output should be some pretty UI showing the definition of the document

Add ability to view details, edit, and delete from table view

You can do all 3 of them from the grid view, but not the table view. Right now I'm thinking we just add in that menu button into a new column in the table for each row.

Provide sort of SLOs

Sort SLOs - It'd be nice if the SLO tiles could be sorted by best / worst performing.

per - @AlecIsaacson

Open this NRQL in chart builder

Allow the NRQL behind an SLO definition to be automatically examined in Chart Builder.

See Graphiql Notebook or Datalyzer for a crib.

Add a link to the SLO definition docs to the UI

@ricegi is creating docs in the repo.

Add a link to it in the empty state
Add a link to it in the edit screen

Add a description to an SLO

display description in view details (4c0e5b3)
display description in grid card view
mouseover description in table view

Create SLO definition doc in the repo

View the SLO definition outside of an edit function

Alert defined for SLOs (Budget Perspective)

Summary

As SLOs are defined we should think about them in terms of their overall budget. If the attainment objective is 99.98 ... alerting on the rate of budget consumption versus the total amount of time remaining in the time period.

e.g. - Error SLO of 99.5 ... halfway through the measurement period we are at 99.6 attainment - meaning based on a straight line rate calculation for the SLO we are not going to make our time-bound objective.

Desired Behaviour

Alerts defined for SLOs that are sophisticated enough to execute the rate based budget consumption alerting for an SLO

Possible Solution

TBD

Additional context

Use time window and rate of consumption for the alerting context ...

7 day SLO alert
30 day SLO alert
Specific Month SLO alert

Tab b/w a card vs. tabular view

SLO/R Entity Nerdlet not refreshing on Entity select

When looking at SLOs in the SLO/R entity nerdlet and you select a new entity in the breadcrumbs selected the view does not update

Description

see above

Steps to Reproduce

see above

Expected Behaviour

The context should switch and you should see the SLOs associated with the new entity you've selected.

Relevant Logs / Console output

N/A

Your Environment

N/A

Additional context

None

Launcher - Summary View

For each indicator (have a section per indicator):

render a summary "row" summarizing each indicator (in-memory using row-level data)
render a table or list of each SLO

// Some psuedo-code

render () {
  return SLO_INDICATORS.map((indicator) => {
    return <>
      <IndicatorSummary></IndicatorSummary>
      <IndicatorTable></IndicatorTable>
    </>
  )
}

newrelic / nr1-slo-r Goto Github PK

nr1-slo-r's Issues

Description

Steps to Reproduce

Expected Behaviour

Relevant Logs / Console output

Your Environment

Additional context

Description

Steps to Reproduce

Expected Behaviour

Additional Attachments

Summary

Desired Behaviour

Possible Solution

Additional context

Description

Steps to Reproduce

Expected Behaviour

Relevant Logs / Console output

Your Environment

Additional context

Summary

Desired Behaviour

Possible Solution

Additional context

Summary

Desired Behaviour

Possible Solution

Summary

Desired Behaviour

Possible Solution

Additional context

Summary

Desired Behaviour

Possible Solution

Additional context

Summary

Desired Behaviour

Possible Solution

Additional context

Description

Steps to Reproduce

Expected Behaviour

Relevant Logs / Console output

Your Environment

Additional context

Summary

Desired Behaviour

Possible Solution

Additional context

Description

Steps to Reproduce

Expected Behaviour

Relevant Logs / Console output

Your Environment

Additional context

Recommend Projects

Recommend Topics

Recommend Org