Comments (1)
The normal approach is to consider any request that doesn't meet latency target as a straight up error. This would mean you should use le
approach, and tailor le
to your desired targets. You can read underlying motivation here https://grafana.com/blog/2019/11/27/kubecon-recap-how-to-include-latency-in-slo-based-alerting/.
For your histogram_quantile
approach it seems to calculate error rate as number of time slices with a 98q delay over 1s. It might help smoothing the graphs to make the time slices as small as possible, something like
errorQuery: -
sum_over_time(
count(
histogram_quantile(0.98,
sum by (le) (rate(<expr>[30s]))
)
> 1000)
[{{.window}}:])
Still drawbacks I see with such an approach are
- rates each time period as equally important (not taking traffic into account)
- I wouldn't trust
histogram_quantile(..) > 1000
to report accurately unless I also have ale="1000"
bucket - Seems like an expensive query to execute over large windows
from sloth.
Related Issues (20)
- How can I reset the error budget remaining to 100 for 7 days from 30days HOT 3
- Option to generate sloth yaml - using 5m record rule chaining
- Sloth Alerting Rules Not Firing - Graphs Empty on Query Test HOT 1
- Testing an operator which manages Sloth SLOs HOT 1
- Question: Is there a way to refer totalQuery via template variable in errorQuery
- promql expr validation issues HOT 2
- Overriding the `sloth_id` doesn't work
- Feature Request: Provision SLOs from Helm install HOT 2
- Issues making Sloth work with Google Managed Prometheus HOT 1
- what does the current remaining buget -4.69e -12% mean in sloth HOT 2
- Alerting expression changes in Prometheus Alerts browser HOT 1
- 🔴 Project Status HOT 9
- Confusing definitions of errorQuery and totalQuery
- Have you considered creating 'totalQuery' as a recording rule as well?
- grafana dashboard broken for SLOs with dots in the name
- error: "generate" command failed: invalid spec, could not load with any of the supported spec types HOT 1
- How can one add a weekly maintenance window into the calculations for SLO's with sloth? HOT 1
- NaN in SLO dashboard HOT 5
- Sloth pod is not showing SLO metrics HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sloth.