Comments (7)
Above issue is fixed, we can now target this for v2
.
from slo-generator.
@ocervell Just to make sure we are on the same page, are we talking about this MQL?
from slo-generator.
@lvaylet yes !
from slo-generator.
@ocervell Taking samples/stackdriver/slo_gae_app_availability.yaml
as an example, do you confirm that supporting MQL will let us write backend.measurement.filter_good
and backend.measurement.filter_valid
like:
---
service_name: gae
feature_name: app
slo_description: Availability of App Engine app
slo_name: availability
slo_target: 0.95
backend:
class: Stackdriver
method: good_bad_ratio
project_id: ${STACKDRIVER_HOST_PROJECT_ID}
measurement:
filter_good: >
fetch gae_app
| metric 'appengine.googleapis.com/http/server/response_count'
| filter
resource.project_id == '${GAE_PROJECT_ID}'
&&
( metric.response_code == 429 ||
metric.response_code == 200 ||
metric.response_code == 201 ||
metric.response_code == 202 ||
metric.response_code == 203 ||
metric.response_code == 204 ||
metric.response_code == 205 ||
metric.response_code == 206 ||
metric.response_code == 207 ||
metric.response_code == 208 ||
metric.response_code == 226 ||
metric.response_code == 304 )
filter_valid: >
fetch gae_app
| metric 'appengine.googleapis.com/http/server/response_count'
| filter
resource.project_id == '${GAE_PROJECT_ID}'
exporters:
- class: Stackdriver
project_id: ${STACKDRIVER_HOST_PROJECT_ID}
or even replace both fields with a new ratio
field that leverages the native features of MQL, like:
---
service_name: gae
feature_name: app
slo_description: Availability of App Engine app
slo_name: availability
slo_target: 0.95
backend:
class: Stackdriver
method: good_bad_ratio
project_id: ${STACKDRIVER_HOST_PROJECT_ID}
measurement:
ratio: >
fetch gae_app
| metric 'appengine.googleapis.com/http/server/response_count'
| filter resource.project_id == '${GAE_PROJECT_ID}'
| { filter
( metric.response_code == 429 ||
metric.response_code == 200 ||
metric.response_code == 201 ||
metric.response_code == 202 ||
metric.response_code == 203 ||
metric.response_code == 204 ||
metric.response_code == 205 ||
metric.response_code == 206 ||
metric.response_code == 207 ||
metric.response_code == 208 ||
metric.response_code == 226 ||
metric.response_code == 304 )
;
ident
}
| ratio
exporters:
- class: Stackdriver
project_id: ${STACKDRIVER_HOST_PROJECT_ID}
Extra questions:
- Can I assume that the
good_bad_ratio
method of theStackdriverBackend
class should be reused and adjusted to perform different operations based on the presence/absence of the YAML fields above? Or shall we come up with a new method with a different return value and/or return type, asgood_bad_ratio
currently returns a tuple with the number of good and bad events? - Shall we introduce a new flag for the language used in the queries (one of
legacy
ormql
, with a default value oflegacy
and a deprecation notice forv3
in favor of MQL)? Or shall we detect the language automatically, on a best-effort basis? I would rather be explicit and go for the extra argument. Automatic detection could be tricky and not as simple as.startsWith('fetch')
.
What do you think? Did you have something in mind already?
from slo-generator.
@lvaylet yes for your main question, that's exactly how we should be able to write MQL. The second way to do MQL with ratio
looks good too, I think we can support both but the ratio could be another method called query_sli
(similar to what we need for Prometheus backend here - and that one will return one value (the SLI) instead of a tuple (good, bad)).
For your extra questions:
- Yes, I think we should be using the same
good_bad_ratio
method, even though it can call a different instance method under the hood (likequery_mql
instead ofquery
if we're using MQL. - I think you're right about auto-detection, there are a lot of edge cases... I think a new flag
lang=mql
,lang=mqf
(MQF = Monitoring Query Filters) would be more explicit. If not passed, the flag sets tomqf
for now, until we deprecate it. Not sure when Monitoring Query Filters will be deprecated, but currently all our users use this instead of MQL, so we might even target > v4 for deprecation.
from slo-generator.
@ocervell As mentioned in googleapis/python-monitoring#47, support for MQL was added in google-cloud-monitoring
version 2.2.0.
Unless there is a specific reason to target version 1.x.x of google-cloud-monitoring
(like supporting Python 2.7), is it OK if I bump the google-cloud-monitoring
version in setup.py
? It is currently set to 'google-cloud-monitoring < 2.0.0'
for the cloud_monitoring
and cloud_service_monitoring
extras. Then I am not sure whether we should also bump the version of google-api-python-client
, also set to 'google-api-python-client < 2.0.0'
for both extras?
Again in setup.py
, we might have to bump the required Python version from >=3.4
to >=3.6
as google-cloud-monitoring
2.0.0 requires Python 3.6+ (refer to the 2.0.0 Migration Guide for more details).
from slo-generator.
Upon investigation, bumping the version of google-cloud-monitoring
to v2 requires significant updates to the existing code. As mentioned in the 2.0.0 Migration Guide:
The 2.0 release of the google-cloud-monitoring client is a significant upgrade based on a next-gen code generator, and includes substantial interface changes. Existing code written for earlier versions of this library will likely require updates to use this version.
As a consequence, most of the Cloud Monitoring backend must be rewritten. For example, the static method get_window(timestamp, window)
:
@staticmethod
def get_window(timestamp, window):
measurement_window = monitoring_v3.types.TimeInterval()
measurement_window.end_time.seconds = int(timestamp)
measurement_window.end_time.nanos = int(
(timestamp - measurement_window.end_time.seconds) * 10**9)
measurement_window.start_time.seconds = int(timestamp - window)
measurement_window.start_time.nanos = measurement_window.end_time.nanos
LOGGER.debug(pprint.pformat(measurement_window))
return measurement_window
must be rewritten as:
@staticmethod
def get_window(timestamp, window):
seconds = int(timestamp)
nanos = int((timestamp - seconds) * 10**9)
measurement_window = monitoring_v3.TimeInterval({
"end_time": {"seconds": seconds, "nanos": nanos},
"start_time": {"seconds": int(seconds - window), "nanos": nanos},
})
LOGGER.debug(pprint.pformat(measurement_window))
return measurement_window
At the end of the day, supporting MQL is not as simple as instantiating a new QueryServiceClient
next to the existing MetricServiceClient
and let it handle the MQL queries.
In addition to the backend code, we might have to migrate the unit tests too. Maybe write more of them to make sure the whole backend is covered before such a major refactoring.
@ocervell As discussed offline, let's target a minor release like v2.1 if using the new client does not introduce any breaking change for the end users, or v3 in case we need to introduce breaking changes.
from slo-generator.
Related Issues (20)
- 🐛 [BUG] - SLO Generator Cloud Run service in test project crashes continuously
- 💡 [REQUEST] - Document everything going on in GCP project `slo-generator-ci-a2b4`
- 💡 [REQUEST] - Automate dependency updates with `renovate-bot` HOT 1
- 💡 [REQUEST] - Investigate the results of the Scorecards GitHub Action
- 💡 [REQUEST] - Add a "Why?" section to `README.md`
- 💡 [REQUEST] - Multiwindow, multi-burn-rate alerts HOT 2
- 🐛 [BUG] - CI is unable to deploy new images to Cloud Run
- 🐛 [BUG] - Warnings in GitHub Actions workflows
- 🐛 [BUG] - Similar GitHub Actions workflows have different triggers
- 💡 [REQUEST] - Make the Docker image smaller HOT 1
- 💡 [REQUEST] - Prometheus SLO Recording Rule examples HOT 1
- 🐛 [BUG] - `safety check` fails during CI HOT 2
- 🐛 [BUG] - `safety` reports CVE in `pip` with Python 3.11 HOT 7
- 🐛 [BUG] - latest releases aren't pushed into the GCR HOT 10
- 🐛 [BUG] - `safety` reports CVE in `pip <23.3` HOT 1
- 🐛 [BUG] - Unit tests are failing with all versions of Python HOT 2
- 🐛 [BUG] - Image not available from gcr HOT 3
- 🐛 [BUG] - Failed to release `v2.6.0` HOT 13
- 🐛 [BUG] - Synthetic Probes show a high number a 504 errors HOT 7
- 💡 [REQUEST] - Collect Cloud Trace data to troubleshoot latency issues and timeouts HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slo-generator.